Proceedings of the
Tenth IEEE Workshop on
Statistical Signal and Array Processing
Sponsored by
The IEEE Signal Processing Society
August 14-16, 2000
Pocono Manor Inn, Pocono Manor, Pennsylvania, USA
20001024 007
Supported by
Office of Naval Research Air Force Research Laboratory Villanova University
USA USA USA
DUG QUALITY Hfszursnp 4
REPORT DOCUMENTATION PAGE
Form Approved
OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching data sources.
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection
of information, including suggestions for reducing this burden to Washington Headquarters Service, Directorate for Information Operations and Reports,
1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 222024302, and to the Office of Management and Budget,
Paperwork Reduction Project (0704-0188) Washington, DC 20503.
PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. _
1. REPORT DATE (DD-MM-YYYY)
15-10-2000
4. TITLE AND SUBTITLE
2. REPORT DATE
3. DATES COVERED (From - To)
January 2000 - September 2000
5a. CONTRACT NUMBER
THE 10TH IEEE SIGNAL PROCESSING
WORKSHOP ON STATISTICAL SIGNAL
AND ARRAY PROCESSING
5b. GRANT NUMBER
N000 1 4-00- 1 -00 1 4
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S)
5d. PROJECT NUMBER
5e. TASK NUMBER
Moeness G. Amin
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
Villanova University
800 Lancaster Ave
Villanova, PA 19085
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
Office of Naval Research, Program Officer: W. Miceli
Ballston Center Tower One
800 North Quincy Street
Arlington, VA 22217-5660
12. DISTRIBUTION AVAILABILITY STATEMENT
8. PERFORMING ORGANIZATION
REPORT NUMBER
Acc: 527639
10. SPONSOR/MONITOR'S ACRONYM(S)
11. SPONSORING/MONITORING
AGENCY REPORT NUMBER
Approved for Public Release; Distribution is Unlimited
14. ABSTRACT
This is the Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP), which was held at the Pocono
Manor Inn. Pocono Manor, Pa during the period of August 14th-16th, 2000. The Workshop featured four keynote speakers whose talks
covered the areas of Radar and Sonar Signal Processing; Time-Delay Estimation; Space-Time Codes; and Multi-carrier CDMA. The
Workshop offered traditional and new research topics. It included one session on Radar Signal Processing, one session on Signal
Processing for GPS, one session on Network Traffic Modeling, one session on Statistical Signal Processing, one session on Acoustical
Signal Processing, two sessions on Time-Frequency Analysis, two sessions on Array Processing, three sessions on Second and Higher
Order Statistics, and four sessions on Signal Processing for Communications. The workshop received the highest number of paper
submissions compared to previous workshops in the same area, and the technical committee carefully selected high quality papers for
presentations. The 2000 IEEE-SSAP Workshop was a tremendous success in all aspects.
15. SUBJECT TERMS
Radar Signal Processing, Statistical Signal Processing, Signal Processing for Communications,
Time-Frequency Analysis, Array Processing
16. SECURITY CLASSIFICATION OF: |
a. REPORT
b. ABSTRACT
c. THIS PAGE
u
u
U
17. LIMITATION OF
ABSTRACT
18. NUMBER 19a. NAME OF RESPONSIBLE PERSON
OF PAGES
19b. TELEPONE NUMBER ( Include area code)
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI-Std Z39-18
Proceedings of the
Tenth IEEE Workshop on
Statistical Signal and Array Processing
Sponsored by
The IEEE Signal Processing Society
August 14-16, 2000
Pocono Manor Inn, Pocono Manor, Pennsylvania, USA
Supported by
Office of Naval Research Air Force Research Laboratory Villanova University
USA USA USA
Proceedings of the Tenth IEEE Workshop on Statistical Signal and Array Processing
Copyright and Reprint Permission: Abstracting is permitted with credit to the source. Libraries are
permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those ar¬
ticles in this volume that carry a code at the bottom of the first page, provided the per-copy fee in¬
dicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA
01923. For other copying, reprint or republication permission, write to IEEE Copyrights Manager,
IEEE Operations Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331. All rights
reserved. Copyright © 2000 by The Institute of Electrical and Electronics Engineers, Inc.
IEEE Catalog Number: 00TH8496
ISBN: 0-7803-5988-7 (hardbound)
Library of Congress Number: 99-69422
IEEE SSAP-2000 Workshop Committee
General and Organizational Chair
Moeness Amin
Villanova University, USA.
e-mail:moeness @ ece.vill.edu
Technical Chair
Mike Zoltowski
Purdue University, USA
e-mail:mikedz@ecn.purdue.edu
Finance
Kevin Buckley
Villanova University, USA
e-mail:buckley@ece.vill.edu
Publicity
Rick Blum
Lehigh University, USA
e-mail:rblum@EECS.Lehigh.EDU
Proceedings
Bill Jemison
Lafayette College, USA
e-mail: w.d.jemison@ieee.org
Local Arrangement
Wojtek Berger
University of Scranton, USA
e-mail:Berger@ Scranton.edu
Asian Liaison
Rahim Leyman
e-mail:EARLEYMAN @ ntu.edu.sg
Australian Liaison
Abdelhak Zoubir
e-mail:zoubir@mail.atri.curtin. edu.au
European Liaison
Pierre Comon
e-mail:Pierre.Comon@i3s. unice.fr
Table of Contents
Session MA-1. SIGNAL PROCESSING FOR COMMUNICATIONS I
Multistage Multiuser Detection for CDMA with Space-Time Coding
Y. Zhang and R. S. Blum — Lehigh University . 1
Adaptive MAP Multi-User Detection for Fading CDMA Channels
C. Andrieu and A. Doucet — Cambridge University, UK
A. Touzni — NxtWave Communications . 6
Analysis of a Subspace Channel Estimation Technique for Multicarrier CDMA Systems
C. J. Escudero, D. I. Iglesia, M. F. Bugallo, and L. Castedo — Universidad de La Coruna, Spain . 10
Blind Adaptive Asynchronous CDMA Multiuser Detector Using Prediction Least Mean Kurtosis Algorithm
K. Wang and Y. Bar- Ness — New Jersey Institute of Technology . . 15
MMSE Equalization for Forward Link in 3G CDMA: Symbol-Level Versus Chip-Level
T. R Krauss, W. J. Hillery, and M. D. Zoltowski — Purdue University . 18
Transform Domain Array Processing for CDMA Systems
Y. Zhang and M. G. Amin — Villanova University
K. Yang — ATR Adaptive Communications Research Laboratories, Japan . 23
Sectorized Space-Time Adaptive Processing for CDMA Systems
K. Yang, Y. Mizuguchi — ATR Adaptive Communications Research Laboratories, Japan
Y. Zhang — Villanova University . 28
Demodulation of Amplitude Modulated Signals in the Presence of Multipath
Z. Xu and P. Liu — University of California . 33
Multichannel and Block Based Precoding Methods for Fixed Point Equalization of Nonlinear Communication
Channels
A. J. Redfern — Texas Instruments
G. T. Zhou — Georgia Institute of Technology . 38
Joint Estimation of Propagation Parameters in Multicarrier Systems
S. Aouada and A. Belouchrani — Ecole Nationale Polytechnique, Algeria . 43
OFDM Spectral Characterization: Estimation of the Bandwidth and the Number of Sub-Carriers
W. Akmouche — CELAR, France
E. Kerherve and A. Quinquis — ENSIETA, France . 48
Blind Source Separation of Nonstationary Convolutively Mixed Signals
B. S. Krongold and D. L. Jones — University of Illinois at Urbana-Champaign . 53
A Versatile Spatio-Temporal Correlation Function for Mobile Fading Channels with Non-Isotropic Scattering
A. Abdi and M. Kaveh — University of Minnesota . 58
Session MA-2. Array Processing I
A Batch Subspace ICA Algorithm
A. Mansour and N. Ohnishi — RIKEN, Japan . 63
Comparative Study of Two-Dimensional Maximum Likelihood and Interpolated Root-Music with Application to
Teleseismic Source Localization
P.J. Chung and J. F. Bohme — Ruhr University, Germany
A. B. Gershman — McMaster University, Canada . 68
Bounds on Uncalibrated Array Signal Processing
B. M. Sadler — Army Research Laboratory
R. J. Kozick — Bucknell University . 73
Array Processing in the Presence of Unknown Nonuniform Sensor Noise: A Maximum Likelihood Direction Finding
Algorithm and Cram6r-Rao Bounds
M. Pesavento and A. B. Gershman — McMaster University, Canada . 78
Matched Symmetrical Subspace Detector
V. S. Golikov and F. C. Pareja — Ciencia y Tecnologia del Mayab, A. C., Mexico . 83
Table of Contents
Multiple Source Direction Finding with an Array of M Sensors Using Two Receivers
E. Fishier and H. Messer — Tel Aviv University, Israel . 86
Self-Stabilized Minor Subspace Extraction Algorithm Based on Householder Transformation
K. Abed-Meraim and S. Attallah — National University of Singapore, Singapore
A. Chkeif — Telecom Paris, France
Y. Hua — University of Melbourne, Australia . 90
A Bootstrap Technique for Rank Estimation
P. Pelin, R. Brcich and A. Zoubir — Curtin University of Technology, Australia . 94
Detection-Estimation of More Uncorrelated Sources than Sensors in Noninteger Sparse Linear Antenna Arrays
Y. I. Abramovich and N. K. Spencer — CSSIP, Australia . 99
A New Gerschgorin Radii Based Method for Source Number Detection
H. Wu and C. Chen — Southern Taiwan University of Technology, Taiwan . 104
Session MA-3. SPECTRUM ESTIMATION I
Adapting Multitaper Spectrograms to Local Frequency Modulation
J. W. Pitton — University of Washington . 108
Optimal Subspace Selection for Non-Linear Parameter Estimation Applied to Refractivity from Clutter
S. Kraut and J. Krolik — Duke University . 113
MAP Model Order Selection Rule for 2-D Sinusoids in White Noise
M. A. Kliger and J. M. Francos — Ben-Gurion University, Israel . 118
Optimum Linear Periodically Time-Varying Filter
D. Wei — Drexel University . 123
Fast Approximated Sub-Space Algorithms
M. A. Hasan — University of Minnesota Duluth
A. A. Hasan — College of Electronic Engineering, Libya . 127
Stochastic Algorithms for Marginal Map Retrieval of Sinusoids in Non-Gausslan Noise
C. Andrieu and A. Doucet — University of Cambridge, UK . 131
Harmonic Analysis Associated with Spatio-Temporal Transformations
J. Leduc — Washington University in Saint Louis . 1 30*
Session MP-1. SIGNAL PROCESSING FOR COMMUNICATIONS II
Blind Noise and Channel Estimation
M. Frikel, W. Utschick, andJ. Nossek — Technical University of Munich, Germany . . 141
Multiuser Detection in Impulsive Noise via Slowest Descent Search
P. Spasojevic — Rutgers University
X. Wang — Texas A&M University . 146
Maximum Likelihood Delay-Doppler Imaging of Fading Mobile Communication Channels
L. M. Davis — Bell Laboratories, Australia
I. B. Coliings — University of Sydney, Australia
R. J. Evans — University of Melbourne, Australia . 151
Enhanced Space-Time Capture Processing for Random Access Channels
A. M. Kuzminskiy, K. Samaras, C. Luschi and P. Strauch — Bell Laboratories, Lucent Technologies, UK . 156
Asymmetric Signaling Constellations for Phase Estimation
T. Thaiupathump, C. D. Murphy and S. A. Kassam — University of Pennsylvania . 161
A Convex Semi-Blind Cost Function for Equalization in Short Burst Communications
K. K. Au and D. Hatzinakos — University of Toronto, Canada . 166
ii
Table of Contents
Performance Analysis of Blind Carrier Phase Estimators for General QAM Constellations
E. Serpedin — Texas A&M University
P. Ciblatand P. Loubaton — University de Marne-la-Vallde, France
G. B. Giannakis — University of Minnesota . 171
Unbiased Parameter Estimation for the Identification of Bilinear Systems
S. Meddeb, J. Y. Tourneret and F. Castanie — ENSEEIHT /VESA, France . 176
Blind Identification of Linear-Quadratic Channels with Usual Communication Inputs
N. Petrochilos — Delft University of Technology, Netherlands
P. Comon — University de Nice, France . 181
Joint Channel Estimation and Detection for Interference Cancellation in Multi-Channel Systems
C. Martin and B. Ottersten — Royal Institute of Technology (KTH), Sweden . 186
A Spatial Clustering Scheme for Downlink Beamforming In SDMA Mobile Radio
IV. Huang and J. F. Doherty — Pennsylvania State University . 191
On the Use of Cyclostationary Filters to Transmit Information
A. Duverdier — ONES, France
B. Lacaze and J. Tourneret — ENSEEIHT/SIC, France . 196
Non-Parametric Trellis Equalization in the Presence of Non-Gaussian Interference
C. Luschi — Bell Laboratories, Lucent Technologies, UK
B. Mulgrew — University of Edinburgh, UK . 201
Analytical Blind Identification of a SISO Communication Channel
O. Grellier and P. Comon — University de Nice, France . . 206
The Role of Second-Order Statistics in Blind Equalization of Nonlinear Channels
R. Lopez-Valcarce and S. Dasgupta — University of Iowa . 211
On Super-Exponential Algorithm, Constant Modulus Algorithm and Inverse Filter Criteria for Blind Equalization
C. Chi, C. Chen and B. Li — National Tsing Hua University, Taiwan . 216
Session MP-2. STATISTICAL SIGNAL PROCESSING
An Efficient Algorithm for Gaussian-Based Signal Decomposition
Z Hong and B. Zheng — Xidian University, China . 221
Consistent Estimation of Signal Parameters In Non-Stationary Noise
J. Friedmann, E. Fishier and H. Messer — Tel Aviv University, Israel . 225
Channel Order and RMS Delay Spread Estimation for AC Power Line Communications
H. Li — Stevens Institute of Technology
Z. Bi and J. Li — University of Florida
D. Liu — Watson Research Center
P. Stoica — Uppsala University, Sweden . 229
Taylor Series Adaptive Processing
D. J. Rabideau — Massachusetts Institute of Technology . . . 234
Adaptive Bayesian Signal Processing — A Sequential Monte Carlo Paradigm
X. Wang and R, Chen — Texas A&M University
J. S. Liu — Stanford University . 239
QQ-Plot Based Probability Density Function Estimation
Z. Djurovic and V. Barroso — Instituto Superior Tecnico — Instituto de Sistemas e Robdtica, Portugal
B. Kovacevic — University of Belgrade, Yugoslavia . . 243
Nonlinear System Inversion Applied to Random Variable Generation
A. Pagds-Zamora, M. A. Lagunas and X. Mestre — Universitat Politdcnica de Catalunya, Spain . 248
The Numerical Spread as a Measure of Non-Stationarity: Boundary Effects in the Numerical Expected Ambiguity
Function
R. A. Hedges and B. W. Suter — Air Force Research Laboratory IFGC . 252
iii
Table of Contents
Locally Stationary Processes
M. E. Oxley and T. F. Reid — Air Force Institute of Technology
B. W. Suter — Air Force Research Laboratory . 257
Statistical Performance Comparison of a Parametric and a Non-Parametric Method for If Estimation of Random
Amplitude Linear FM Signals in Additive Noise
M. R. Morelande, B. Barkat and A. M. Zoubir — Curtin University of Technology, Australia . 262
Session MP-3. RADAR SIGNAL PROCESSING
The Application of a Nonlinear Inverse Noise Cancellation Technique to Maritime Surveillance Radar
M. R. Cowper and B. Mulgrew — University of Edinburgh, UK. . 267
Adaptive Digital Beamforming RADAR for Monopulse Angle Estimation in Jamming
K. Yu — GE Research & Development Center
D. J. Murrow — Lockheed Martin Ocean, Radar & Sensors Systems . 272
Statistical Analysis of SMF Algorithm for Polynomial Phase Signals Analysis
A. Ferrari and G. Alengrin — University de Nice Sophia-Antipolis, France . 276
Passive Sonar Signature Estimation Using Bispectrai Techniques
R. K. Lennartsson, J.W.C. Robinson, and L. Persson — Defence Research Establishment, Sweden
M.J. Hinich — University of Texas at Austin
S. McLaughlin — University of Edinburgh, UK . 281
Approximate CFAR Signal Detection in Strong Low Rank Non-Gaussian Interference
/. P. Kirsteins — Naval Undersea Warfare Center
M. Rangaswamy — ARCON Corporation . 286
Blind Equalization of Phase Aberrations in Coherent Imaging: Medical Ultrasound and SAR
S. D. Silverstein — University of Virginia . 291
False Detection of Chaotic Behaviour in the Stochastic Compound K-Distribution Model of Radar
Sea Clutter
C. P. Unsworth, M.R. Cowper, S. McLaughlin, and B. Mulgrew — University of Edinburgh, UK . 296
Session TA-1 . BLIND SOURCE SEPARATION
Recursive Estimator for Separation of Arbitrarily Kurtotic Sources
M. Enescu and V. Koivunen — Helsinki Univ. of Technology, Finland . 30 1
A Second Order Multi Output Deconvolution (SOMOD) Technique
H. Bousbia-Salah and A. Belouchrani — Ecole Nationale Polytechnique, Algeria . 306
DOA Estimation of Many W-Disjoint Orthogonal Sources from Two Mixtures Using Duet
S. Rickard — Princeton University
F. Dietrich — Siemens Corporate Research . 311
Blind Separation of Non-Circular Sources
J. Galy — LIRMM, France
C.Adnet — Thomson-Csf Airsys, France . 315
Blind Identification of Slightly Delayed Mixtures
G. Chabriel and J. Barrdre — University de Toulon et du Var, France . 319
Robust Source Separation Using Ranks
L. Xiang, Y. Zhang and S. A. Kassam — University of Pennsylvania . 324
Semi-Blind Maximum Likelihood Separation of Linear Convolutive Mixtures
J. Xavier and V. Barroso — Instituto Superior Tricnico — Instituto de Sistemas e Robdtica, Portugal . 329
Techniques for Blind Source Separation Using Higher-Order Statistics
Z. M. Kamran and A. R. Leyman — Nanyang Technological University, Singapore
K. Abed-Meraim — ENST/TSI, France . 334
iv
Table of Contents
Joint-Diagonalization of Cumulant Tensors and Source Separation
E. Moreau — MS-GESSY, ISITV, France . 339
New Criteria for Blind Signal Separation
N. Thirion-Moreau and E. Moreau — MS-GESSY, ISITV, France . 344
An Iterative Algorithm Using Second Order Moments Applied to Blind Separation of Sources with Same Spectral
Densities
J. Cavassilas, B. Xerri and B. Borloz — University de Toulon et du Var, France . v . 349
Performance of Cumulant Based Inverse Filter Criteria for Blind Deconvolution of Multi-Input Multi-Output Linear
Time-Invariant Systems
C. Chi and C. Chen — National Tsing Hua University, Taiwan . 354
Separation of Non Stationary Sources; Achievable Performance
J. Cardoso — C.N.R.S./E.N.S.T., France . 359
Modified BSS Algorithms Including Prior Statistical Information about Mixing Matrix
J. Igual and L. Vergara — Universidad Politecnica Valencia, Spain . 364
Approximate Maximum Likelihood Blind Source Separation with Arbitrary Source PDFs
M. Ghogho and T. Durrani — University of Strathclyde, UK
A. Swami — Army Research Lab . 368
Session TA-2. SPECTRUM ESTIMATION II
Power Spectral Density Analysis of Randomly Switched Pulse Width Modulation for DC/AC Converters
R. L. Kirlin — University of Victoria, Canada
M. M. Bech — University of Aalborg, Denmark
A M. Trzynadlowski — University of Nevada Reno . 373
Study on Spectral Analysis and Design for DC/DC Conversion Using Random Switching Rate PWM
R. L. Kirlin, J. Wang, and R. M. Dizaji — University of Victoria, Canada . 378
Spectral Subtraction and Spectral Estimation
M. A. Lagunas and A. I. Perez-Neira — Campus Nord UPC, Spain . 383
Parameter Estimation: The Ambiguity Problem
V. Lefkaditis and A. Manikas — Imperial College of Science, Technology and Medicine, UK . 387
On Multiwindow Estimators for Correlation
A. Hanssen — University of Tromso, Norway . 391
Asymptotic Analysis of the Least Squares Estimate of 2-D Exponentials in Colored Noise
G. Cohen and J. M. Francos — Ben-Gurion University, Israel . 396
Cross-Spectral Methods for Processing Biological Signals
D. J, Nelson — Department of Defense . 400
Default Prior for Robust Bayesian Model Selection of Sinusoids in Gaussian Noise
C. Andrieu — Cambridge University, UK
J.-M. Perez — Universidad Simdn Bolfvar, Venezuela . 405
On the Exact Solution to the “Gliding Tone” Problem
L. Galleani and L. Cohen — City University of New York . 410
Baseline and Distribution Estimates of Complicated Spectra
D. J. Thomson — Bell Labs . 414
Session TA-3. ARRAY PROCESSING II
Distributed Source Localization with Multiple Sensor Arrays and Frequency-Selective Spatial Coherence
R. J. Kozick — Bucknell University
B. M. Sadler — Army Research Laboratory . 419
v
Table of Contents
Deterministic Maximum Likelihood DOA Estimation in Heterogeneous Propagation Media
P. Stoica — Uppsala University, Sweden
O. Besson — ENSICA, France
A. B. Gershman — McMaster University, Canada .
Efficient Signal Detection in Perturbed Arrays
A. M. Rao and D. L. Jones — University of Illinois .
A Neural Network Approach for DOA Estimation and Tracking
L. Badidi and L. Radouane — LESSI, Morocoo .
Partially Adaptive Array Algorithm Combined with CFAR Technique in Transform Domain
S. Moon, D. Yun, and D. Han — Kyungpook National University, Korea .
A New Beamforming Algorithm Based on Signal Subspace Eigenvectors
M. Biguesh and M. H. Bastani — Sharif University of Technology, Iran
S. Valaee — Tarbiat Modares University, Iran
B. Champagne — McGill University, Canada .
Detection of Sources in Array Processing Using the Bootstrap
R. Brcich, P. Pelin and A. Zoubir — Curtin University of Technology, Australia .
Robust Localization of Scattered Sources
J. Tabrikian — Ben-Gurion University, Israel
H. Messer — Tel Aviv University, Israel .
Session TP-1. APPLICATION OF JOINT TIME-FREQUENCY TECHNIQUES IN RADAR
PROCESSING
ISAR Imaging and Crystal Structure Determination from EXAFS Data Using a Super-Resolution Fast Fourier
Transform
G. Zweig — Signition, Inc.
B. Wohlberg — Los Alamos National Laboratory .
Analysis of Radar Micro-Doppler Signature With Time-Frequency Transform
V. C. Chen — Naval Research Laboratory .
Estimating the Parameters of Multiple Wideband Chirp Signals in Sensor Arrays
A. B. Gershman and M. Pesavento — McMaster University, Canada
M. G. Amin — Villanova University .
On the Use of Space-Time Adaptive Processing and Time-Frequency Data Representations for Detection of Near-
Stationary Targets in Monostatic Clutter
D. C. Braunreiter, H.-W. Chen, M. L. Cassabaum, J. G. Riddle, A. A. Samuel, J. F. Scholl and H. A. Schmitt —
Raytheon Missile Systems .
Application of Adaptive Joint Time-Frequency Processing to ISAR Image Formation
H. Ling and J. Li — University of Texas at Austin .
Joint Time-Frequency Analysis of SAR Data
R. Fiedler and R. Jansen — Naval Research Laboratory .
Pulse Propagation in Dispersive Media
L Cohen — City University of New York .
Session TP-2. NETWORK TRAFFIC MODELING
Wavelet-Based Models for Network Traffic
D. Wei and H. Cheng — Drexel University .
The Extended On/Off Process for Modeling Traffic in High-Speed Communication Networks
X. Yang, A. P. Petropulu and V. Adams — Drexel University .
A Simulation Study of the Impact of Switching Systems on Self-Similar Properties of Traffic
Y. Zhou and H. Sethu — Drexel University .
424
429
434
439
444
448
453
458
463
467
472
476
480
485
490
495
500
Vi
Table of Contents
Parameter Estimation in Farima Processes with Applications to Network Traffic Modeling
J. Ilow — Dalhousie University, Canada . 505
Session TP-3. SIGNAL PROCESSING FOR GPS
Nonlinear Filtering Algorithm with its application in INS Alignment
R. Zhao and Q. Gu — Tsinghua University, China . . 510
GPS Jammer Suppression with Low-Sample Support Using Reduced-Rank Power Minimization
W. L. Myrick and M. D. Zoltowski — Purdue University
J. S. Goldstein — SAIC . 514
Jammer Excision in Spread Spectrum Using Discrete Evolutionary-Hough Transform and Singular Value
Decomposition
R. Suleesathira and L. F. Chaparro — University of Pittsburgh . 519
Spatial and Temporal Processing of GPS Signals
P. Xiong and S. N. Batalama — State University of New York at Buffalo
M. J. Medley — Air Force Research Laboratory . 524
Subspace Projection Techniques for Anti-FM Jamming GPS Receivers
L Zhao and M. G. Amin — Villanova University
A. R. Lindsey — Air Force Research Laboratory . 529
Session TP-4. WAVELETS
Fixed-Point HAAR-Wavelet-Based Echo Canceller
M. Doroslovacki and I. Khan — George Washington University
B. Kosanovic — Texas Instruments . 534
Wavelet-Polyspectra: Analysis of Non-Stationary and Non-Gaussian/Non-Linear Signals
Y. Larsen and A. Hanssen — University of Tromso, Norway . 539
Adaptive Seismic Compression by Wavelet Shrinkage
M.F. Khdne and S.H. Abdul-Jauwad — King Fahd University of Petroleum & Minerals, Saudi Arabia . 544
Representations of Stochastic Processes Using COIFLET-Type Wavelets
D. Wei and H. Cheng — Drexel University . 549
Session WA-1. TIME-FREQUENCY ANALYSIS
Time-Frequency Coherence Analysis of Nonstationary Random Processes
G. Matz and F. Hlawatsch — Vienna University of Technology Austria . 554
Multi-Component IF Estimation
Z. M. Hussain and B. Boashash — Queensland University of Technology Australia . 559
Detection of Seizures in Newborns Using Time-Frequency Analysis of EEG Signals
B. Boashash, H. Carson and M. Mesbah — Queensland University of Technology, Australia . 564
Multitaper Reduced Interference Distribution
S. Aviyente and W. J. Williams — University of Michigan . 569
Instantaneous Spectral Skew and Kurtosis
P. J. Loughlin and K. L. Davidson — University of Pittsburgh . 574
Adaptive Time-Frequency Representations for Multiple Structures
A. Papandreou-Suppappola — Arizona State University
S. B. Suppappola — Pipeline Technologies, Inc . 579
A Resolution Performance Measure for Quadratic Time-Frequency Distributions
B. Boashash and V. Sucic — Queensland University of Technology Australia . 584
The Wigner Distribution for Ordinary Linear Differential Equations and Wave Equations
L. Galleani and L. Cohen — City University of New York . 589
vii
Table of Contents
Application of Time-Frequency Techniques for the Detection of Anti-Personnel Landmines
fi. Barkat, A.M. Zoubir and C.L. Brown — Curtin University of Technology, Australia
A New Matrix Decomposition Based on Optimum Transformation of the Singular Value Decomposition Basis Sets
Yields Principal Features of Time-Frequency Distributions
D. Groutage — Naval Surface Warfare Center
D. Bennink — Applied Measurements Systems Inti. .
Minimum Entropy Time-Frequency Distributions
A. El-Jaroudi — University of Pittsburgh .
Uncertainty in the Time-Frequency Plane
P. M. Oliveira — Escoia Naval, Portugal
V. Barroso — Instituto Superior Tdcnico iSR/DEEC, Portugal .
High Resolution Frequency Tracking via Non-Negative Time-Frequency Distributions
R. M. Nickel and W. J. Williams — University of Michigan .
594
598
603
607
612
Session WA-2. HIGHER-ORDER SPECTRAL ANALYSIS
A Cumulant Subspace Approach to FIR Multiuser Channel Estimation
J. Liang and Z. Ding — University of Iowa .
An Efficient Forth Order System Identification (FOSI) Algorithm Utilizing the Joint Diagonalization Procedure
A. Belouchrani — Ecole National Polytechnique, Algeria
B. Derras — Cirrus Logic Inc. .
Unity-Gain Cumulant-Based Adaptive Line Enhancer
R. R. Gharieb and A. Cichocki — RIKEN, Japan
Y. Horita and T. Murai — Toyama University, Japan .
Adaptive Detection and Extraction of Sparse Signals Embdded in Colored Gaussian Noise Using Higher Order
Statistics
R. R. Gharieb and A. Cichocki — RIKEN, Japan
S. F. Filipowicz — Warsaw University of Technology, Poland .
Higher-Order Matched Field Processing
R. M. Dizaji, R. L. Kirlin, and N. R. Chapman — University of Victoria, Canada
Multiwindow Bispectral Estimation
Y. Birkelund and A. Hanssen — University of Tromse, Norway
WA-3. SIGNAL PROCESSING FOR COMMUNICATIONS III
Global Convergence of a Single-Axis Constant Modulus Algorithm
A. Shah, S. Biracree, R. A. Casas, T. J. Endres, S. Hulyalkar, T. A. Schaffer, and C. H. Strolle — NxtWave
Communications .
A Novel Modulation Method for Secure Digital Communications
A. Salberg and A. Hanssen — University of Tromse, Norway
A Multitime-Frequency Approach for Detection and Classification of Noisy Frequency Modulations
M. Colas, G. Gelle, and G. Delaunay — L.A.M.-URCA, France
J. Galy — L.I.R.M.M., France .
NDA PLL Design for Carrier Phase Recovery of QPSK/TDMA Bursts without Preamble
J. Lee — COMSAT Laboratories .
An Optimized Multi-Tone Calibration Signal for Quadrature Receiver Communication Systems
R. A. Green — North Dakota State University .
A Polynomial Rooting Approach for Synchronization in Multipath Channels Using Antenna Arrays
G. Seco and J. A. Fermkndez-Rubio — Univ. Politdcnica de Catalunya, Spain
A. L. Swindlehurst — Brigham Young University
616
621
626
631
635
*
640
645
650
655
660
664
668
viii
Table of Contents
Super-Exponential-Estimator for Fast Blind Channel Identification of Mobile Radio Fading Channels
A. Schmidbauer — Munich University of Technology, Germany . 673
Finite Data Record Maximum SINR Adaptive Space-Time Processing
I. N. Psaromiligkos and S. N. Batalama — State University of New York at Buffalo . 677
On the Effects of Rotating Blades on DS/SS Communication Systems
Y. Zhang and M. G. Amin — 1 /Ulanova University
V. Mancuso — Boeing Helicopter Division . 682
Joint Synchronization and Symbol Detection in Asynchronous DS-CDMA Systems
F. Rey G. Vizquez, and J. Riba — Polytechnic University of Catalonia, Spain . 687
New Criteria for Blind Equalization of M-PSK Signals
Z. Xu and P. Liu — University of California . 692
Third-Order Blind Equalization Properties of Hexagonal Constellations
C. D. Murphy — Helsinki University of Technology, Finland . 697
Session WA-4. ACOUSTICAL SIGNAL PROCESSING
Comparison of the Cyclostationary and the Bilinear Approaches: Theoretical Aspects and Applications to Industrial
Signals.
L. Bouillaut and M. Sidahmed — Universite de Technologie de Compiegne, France . 702
Array Processing of Underwater Acoustic Sensors Using Weighted Fourier Integral Method
/. S. D. Solomon and A. J. Knight — Defence Science and Technology Organisation, Australia . 707
A Hierarchical Algorithm for Nearfield Acoustic Imaging
M. Peake and M. Karan — CSSIP, Australia
D. Gray — University of Adelaide, Australia . 712
An Introduction to Synthetic Aperture Sonar
D. Marx, M. Nelson, E. Chang, W. Gillespie, A. Putney, and K. Warman — Dynamics Technology, Inc . 717
Classification of Acoustic and Seismic Data Using Nonlinear Dynamical Signal Models
R. K. Lennartsson — Defence Research Establishment, Sweden
A. Pentek and J. B. Kadtke — University of California . 722
The Performance of Sparse Time-Reversal Mirrors in the Context of Underwater Communications
J. Gomes and V. Barroso — Instituto Superior Tbcnico — Instituto de Sistemas e Robdtica, Portugal . 727
Beam Patterns of an Underwater Acoustic Vector Hydrophone
K. T. Wong — Chinese University of Hong Kong, China
H. Chi — Purdue University . 732
IX
MULTISTAGE MULTIUSER DETECTION FOR CDMA
WITH SPACE-TIME CODING
Yumin Zhang and Rick S. Blum
EECS Department, Lehigh University
Bethlehem, PA 18015
rblum@eecs.lehigh.edu
ABSTRACT
The combination of Turbo codes and space-time block
codes is studied for use in CDMA systems. Each user’s
data are first encoded by a Turbo code. The Turbo coded
data are next sent to a space-time block encoder which
employs a BPSK constellation. The space-time en¬
coder output symbols are transmitted through the fading
channel using multiple antennas. A multistage receiver
is proposed using non-linear MMSE estimation and a
parallel interference cancellation scheme. Simulations
show that with reasonable levels of multiple access in¬
terference (p < 0.3 ), near single user performance is
achieved. The receiver structure is generalized to de¬
code CDMA signals with space-time convolutional cod¬
ing and similar performance is observed.
1. INTRODUCTION
Space-time codes [l]-[4] use multiple transmit and re¬
ceive antennas to achieve diversity and coding gain for
communication over fading channels. High bandwidth
efficiency is achieved, with performance close to the
theoretical outage capacity [1]. Turbo codes [5] are
a family of powerful channel codes, which have been
shown to achieve near Shannon capacity over additive
white Gaussian noise channels. Since their introduc¬
tion, both space-time codes and Turbo codes have re¬
ceived considerable attention. In the CDMA2000 Ra¬
dio Transmission Technology (RTT) proposed for the
third generation systems, both space-time codes and
Turbo codes have been adopted [6].
Although papers treating either just space-time codes
or Turbo codes abound, jointly considering space-time
codes and Turbo codes in CDMA systems is a relatively
new topic. In this paper, we initiate a study on this
topic where we focus on space-time block codes [3] [4],
Our research develops suboptimum low-complexity re¬
ceivers, which will be needed.
This paper is organized as follows. Section 2 first
sets up the system configuration and develops the re¬
ceived signal model. A brief review of space-time block
codes is given in Section 3. The structure of our mul¬
tistage receiver is discussed in Section 4. Section 5
presents simulation results. Conclusions are given in
Section 6.
2. SYSTEM CONFIGURATION AND
RECEIVED SIGNAL MODEL
Fig. 2 depicts a K user synchronous CDMA system
with combined Turbo coding and space-time block cod¬
ing. There are N transmit antennas and M receive an¬
tennas in the system. Suppose user k, k = 1, ..., K, has
a block of binary information bits {dk{i),i = 1, ■■■, Lx)
to transmit. These bits are first encoded by a Turbo
code with rate Rx = The bits which are produced
by the Turbo encoder, denoted by {dk{i),i = 1, ..., L2},
are passed to a space-time block encoder. This space-
time block code uses a transmission matrix Gn [3] with
a BPSK constellation, generates N output bits dur¬
ing each time slot, and has rate R2 = qjf. During
time slot l, N bits are transmitted, which are denoted
by {b„k(l), n = l,...,iV}, for l = 1 The bit
bnk{l) £ {—1,4-1} is spread using a unique spreading
waveform s&(t) and transmitted using antenna n. For
convenience we denote the vector of nth output bits
from all K users as b „(/) = [bni(l), ...,bnK(l)]T , and
we note that all of these bits are transmitted by an¬
tenna n during time slot l. We define the set of bits
{b„(f), l = 0, ...,L — 1} as one frame of data.
The fading coefficient for the path between transmit
antenna n and receive antenna m is denoted by anm . In
our research, we assume a flat quasi-static fading envi¬
ronment [3], where the fading coefficients are constant
during a frame and are independent from one frame to
another. Further we assume for simplicity that perfect
estimates of all fading coefficients are available at the
receiver. The received signal at antenna m is
N K L—l
'.w = £ E £ (^nmAkbnk{l)Sk(t lT)-^-T)m(t) (1)
n—1 k= 1 1=0
where T is the bit period, Ak is the transmitted signal
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
1
amplitude for user k, and r]m(t) is the complex channel
noise at receive antenna m. The received signal rm(t )
is next passed through a matched filter bank, with each
filter matched to one user’s spreading waveform. De¬
note the matched filter outputs at receive antenna m
for the time slot j by ym(j) = \ymi(j), ■■■■,VmK{j)]T ■
The equation describing ym(j) can be represented in
vector form as
N
y m(j) = RA O') + nm0)
n=l
m = j = 0,...,L — 1. (2)
where R is the K x K cross-correlation matrix of the
spreading codes, A = diag(A\, ...,Ak), and nm0) is
the K x 1 complex noise vector after matched filter¬
ing. Assuming the channel noise is Gaussian with zero
mean and autocorrelation function <r2<5(r), nm(j) has
a multidimensional Gaussian distribution TV(0, cr2R).
3. SPACE-TIME BLOCK CODES
An extensive discussion of space-time block codes is
given in [3] [4]. Here we consider only TV = 2 antenna
cases. Extension to TV > 2 cases is straightforward. A
BPSK space-time block code with two transmit anten¬
nas is described by the transmission matrix
The encoder works as follows. The block of L2 Turbo
coded bits enter the encoder and are grouped into units
of two bits. Each group of two bits are mapped to a
pair of BPSK symbols sj and 82. These symbols are
transmitted during two consecutive time slots. During
the first time slot, Si and s 2 are transmitted simultane¬
ously from antenna one and two respectively. During
the second time slot, -s 2 and Si are transmitted si¬
multaneously from antenna one and two, respectively.
The code rate of C?2 is 1.
In [3] [4], the transmission matrix is designed so that
the columns are orthogonal to each other. This allows
a simple receiver structure using only linear processing.
We illustrate this using the code described in (3) as an
example. Extension to TV > 2 cases is straightforward.
Assuming there are M receive antennas, the received
signal at antenna m during the first and second time
slots, denoted by ym( 1) and ym( 2), are
2/m( 1) — QUm^l + OL2mS2 T rim(l)
2/m(2) = Oi\mS2 T oc2mS\ Tnm(2) (4)
where nm(l) and nm( 2) are two iid complex Gaussian
noise samples with variance a2. The observations in
(4) can be combined to yield the improved quantities
si and S2 using
= + Q:2m?/rn(^)
= T |Q2m| )®1 4" QqmJlm(l) "h Q;2m^'m(2)
*2 = a2mVm(X) — °1 m3/m(^)
= (l^lml _t"|o!2m| )S2 + CK2m?Tm(l) Oim7lm(2)
Combining quantities obtained at each receive antenna
yields
M
h = (aim2/m(l) + £*2m2/m(2)) = C SX + Tlx
m— 1
M
^2 = ^{almym{l)-aiimy*in{2)) = Cs2 + n2 (5)
m= 1
where
M
C=X;(Kn |2 + |a2m|2). (6)
m— 1
The Gaussian noise variables ni and «2 have variance
M
°b = dal™|2 + la2m|2) (7)
m= 1
It is easily seen from (5), (6) and (7) that after this sim¬
ple linear combining, the resulting signals are equiva¬
lent to those obtained from using maximal ratio com¬
bining [7] techniques for systems with 1 transmit an¬
tenna and 2M receive antennas. This combining tech¬
nique will be used in two places in our low-complexity
receiver as discussed in the next section.
4. LOW-COMPLEXITY MULTISTAGE
RECEIVER
The optimum receiver that minimizes the frame error
rate should construct a “super-trellis” for decoding.
The super-trellis combines the trellis of Turbo codes
and the structure of the multiuser channel and space-
time block codes. Due to the interleavers used in the
Turbo codes, it is very hard to construct such a super¬
trellis. In fact, “optimum decoding” for Turbo codes
alone is impossible in practice. This is why subopti¬
mum iterative decoding schemes are used to decode
Turbo codes [5]. Thus instead of trying to find an
optimum receiver, which would obviously have a pro¬
hibitively high complexity, our goal in this section is to
develop a low-complexity suboptimum receiver.
We suggest the multistage receiver structure de¬
picted in Fig. 2. The output of the matched filter bank
is first passed to a decorrelat.ing detector [8], which
attempts to eliminate the multiple access interference
(MAI) completely with perfect estimation. The output
2
of the decorrelating detector at receive antenna m and
time slot j is
N
y m(j) = (RA)-1ym(i) = '52anmbn(j) + n m(j) (8)
n= 1
where we defined the noise vector nm(j) — (RA)_1nm(j),
which has a Gaussian distribution with covariance ma¬
trix
R = cr2(ARA)-1. (9)
The elements from yi(j), ..., ym (j ) corresponding to
the feth user, denoted by yik(j),—,yMk(j), are com¬
bined using the technique discussed in Section 3 to pro¬
vide improved observations for user k. These improved
observations are sent to a single user Turbo decoder to
perform the first stage of decoding. The Turbo decoder
produces posterior probabilities for user fc’s transmit¬
ted bits. These posterior probabilities, together with
the diversity combined observations, are used by a soft
estimator to form soft estimates of user k’s transmitted
bits.
The soft estimator uses non-linear minimum mean
square error (MMSE) estimation [9] to form the soft
estimates. From (5), it is seen that the diversity com¬
bined observations for user k can always be represented
in the form of y = Cb + n, where y is the noisy obser¬
vation, b is the transmitted bit, C is a known constant
and n is a complex Gaussian noise sample with vari¬
ance denoted by a%. The soft estimate of b is obtained
by
E{b\y}
2Re(Cy* )
Pr(b=+1) — > ^
Pr(b=- l)c
2Re(Cy”)
b — e "b
2fle(Cy*)
*M»=+i)e—
Pr{b=-l)e
+ e
2Re(Cy* )
(10)
where the prior probabilities Pr(b = ±1) can be up¬
dated using the posterior probabilities obtained by the
Turbo decoders.
The transmitted signals are reconstructed using the
soft estimates as if they were binary digits. Denote
the reconstructed encoder output for antenna n and
user k during time slot j as bnk{j) and define bn(j) =
[5ni (j), bnK(j)]T. The reconstructed signals {b „(j),
n = 1 ,—,N, j = 0, ...,L — 1} are used in soft MAI
cancellation to produce “cleaner” received signals for
each user. To cancel MAI for user k, we first define a
vector b„ (J) equal to b n(j) except that its kth ele¬
ment is zero. The MAI-reduced observation for user k
at receive antenna m is obtained using
N
ym {k)(j) = y m(j) - RA ^2 anmbj1fc)(j) (if)
n= 1
When perfect estimate of b„Q') is available, ym^(i)
offers K different observations of the signal from user k,
contaminated only by channel noise. For simplicity, we
use the fcth element of ym^ (j) for processing, which
gives the highest SNR for user k. The fcth elements
of y m(^(i)» m = 1 at all receive antennas are
combined using the techniques discussed in Section 3.
The improved observations are passed to another set of
Turbo decoders to perform the second stage of decod¬
ing. These Turbo decoders produce the final “hard”
decisions on each user’s transmitted bits.
5. SIMULATION RESULTS
Monte Carlo simulations are carried out to study the
performance of the proposed multistage receiver. Con¬
sider a 4 user synchronous CDMA system with 2 trans¬
mit antennas and 2 receive antennas. Each user’s bits
are first encoded by a rate 1/3 Turbo code with con¬
straint length v = 5 and generator 23, 35 (octal form).
The random interleaver chosen for the Turbo code has
length 128. The block of Turbo coded data is encoded
using a space-time block code with the code matrix
t/2 from (3) and a BPSK constellation. Next the out¬
put bits are spread using each user’s spreading wave¬
form and the results are transmitted using 2 antennas
over the fading channel. The path gains are modeled
as samples of independent complex Gaussian random
variables with variance 0.5 per dimension (real or imag¬
inary). Quasi-static fading is assumed. For the CDMA
channel, we use the symmetric channel model where
the cross-correlation between all pairs of two users is
the common value p. The SNR for user k is defined as
SNRk =
NAk
a2RiR2
(12)
Fig. 3 gives the BER performance of the proposed
multistage receiver in Gaussian noise when all users
have the same power (A = I). The BER performance
for the first stage and second stage decoding are both
plotted, which we denote by “51” and “52” on the
graph. For comparison, we also give the single user
performance, which is the Turbo code performance for
the fading channel under consideration. The perfor¬
mance of the space-time block code using Q2 without
the Turbo coding is also shown. For p = 0.1, single user
performance is nearly achieved after just the first stage
decoding. The second stage decoding curve is indis¬
tinguishable from that of the single user performance.
For p = 0.3, the performance improvement obtained
by employing the second stage of decoding is obvious
from Fig. 3b. After the second stage decoding, single
user performance is approached. By combining a Turbo
code with a space-time block code, a performance gain
3
of about 2.5dB is achieved at BER=10-4 compared to
using a space-time block code only.
An iterative receiver structure can be easily con¬
structed by feeding back the posterior information ob¬
tained after the second stage decoding to the soft es¬
timators. We have carried out simulations using this
iterative structure, but results show that the improve¬
ment over the second stage of decoding is marginal. In
Fig. 3b, we plot the BER performance for the second
iteration of the “iterative receiver” (denoted by “Ite
2”), which is almost indistinguishable from the second
stage decoding curve. Thus the extra computations
incurred by the iterative structure are not justified.
Next we study the performance of our receiver in
a near-far situation where two users are 20dB stronger
than the other two users, all other parameters remain
the same as in Fig 3. The BER performance for the
strong user and weak user are given in Fig. 4a and 4b
respectively. The performance, for both the weak and
strong users, approaches single user performance after
the second stage decoding.
Finally, we point out that the received signal model
in (2) is also valid for a CDMA system with space-time
convolutional coding [1] replacing the combination of
space-time block codes and Turbo codes. An iterative
receiver can be constructed using the parallel interfer¬
ence cancellation scheme [10]. Fig. 1 gives the frame
error rate performance for the first two iterations of the
iterative receiver for a CDMA system with space-time
convolutional coding. It is seen that with 2 iterations,
single user performance is achieved. Another observa¬
tion is that the performance improvement obtained by
employing the iterative structure is marginal. This is
consistent with our previous observations for the space-
time block coded system.
6. CONCLUSIONS
In this paper, we studied the application of Turbo codes
and space-time block codes in CDMA systems. A mul¬
tistage receiver is proposed using parallel interference
cancellation schemes. Simulation results show that with
reasonable levels of MAI (p < 0.3), near single user per¬
formance can be achieved. The receiver developed in
this paper was generalized to decode CDMA signals
with space-time convolutional coding and similar per¬
formance was observed.
7. REFERENCES
[1] V. Tarokh, N. Seshadri, and A. R. Calderbank,
” Space-time codes for high data rate wireless com¬
munication: Performance criteria and code con¬
struction,” IEEE Trans. Info. Theo., vol. 44, No.
2, pp. 744-765, Mar. 1998.
[2] S. M. Alamouti, “A simple transmitter diver¬
sity scheme for wireless communications,” IEEE
JSAC, vol. 16, No. 8, pp. 1451-1458, Oct. 1998.
[3] V. Tarokh, H. Jafarkhani, and A. R. Calderbank,
” Space-time block coding for wireless communica¬
tions: performance results,” IEEE JSAC, vol. 17,
No. 3, pp. 451-460, March 1999.
[4] V. Tarokh, H. Jafarkhani, and A. R. Calderbank,
’’Space-time block codes from orthogonal designs,”
IEEE Trans. Info. Theo., vol. 45, No. 5, pp. 1456-
1467, July 1999.
[5] C. Berrou and A. Glavieux, “Near optimum error
correcting coding and decoding: Turbo-Codes,”
IEEE Trans. Comm,., vol. 44, No. 10, pp. 1261-
1271, Oct. 1996.
[6] S. Dennett, “The CDMA2000 ITU-R RTT candi¬
date submission,” V. 0.17, TIA, July 28, 1998.
[7] J. G. Proakis, Digital Communications, 3rd Edi¬
tion, McGraw-Hill, 1995.
[8] S. Verdu, Multiuser Detection, UK: Cambridge
University Press, 1998.
[9] A. Papoulis, Probability, Random Variables and
Stochastic Processes, New York: McGraw-Hill,
1984.
[10] Yumin Zhang, Iterative and Adaptive Receivers
For Wireless Communication and Radar Systems,
Ph.D. Dissertation, Lehigh University, May 2000.
Figure 1: Performance of the iterative multiuser re¬
ceiver for CDMA with space-time convolutional coding
[10] with K — 4, p = 0.3, 4-PSK S-T code with rate
2/b/s/Hz, 130 symbols per frame, 2 transmit and 2 re¬
ceive antennas where MMSE is used in the first stage
decoding.
4
Figure 2: Structure of our K user CDMA system (including our multistage receiver) with combined Turbo coding
and space-time block coding, N transmit antennas and M receive antennas.
I
|
Figure 3: Performance of the multistage receiver for CDMA with Turbo coding and space-time block coding with
K=4 users, 2 transmit and 2 receive antennas.
0123456769 10
SNR(dB)
(a) p = 0.1
SNR(dB) SNR(dB)
(a) Strong user (b) Weak user
Figure 4: Performance of the multistage receiver for CDMA with Turbo coding and space-time block coding under
a near-far situation with K=4, p = 0.3, 2 transmit and 2 receive antennas. Two users are 20dB stronger than the
other two users.
5
ADAPTIVE MAP MULTI-USER DETECTION FOR FADING CDMA CHANNELS
Christophe Andrieu t - Arnaud Douce A - Azzedine Touznfi
^Signal Processing Group, Engineering Dept. Cambridge University
Trumpington Street, CB2 1PZ Cambridge, UK.
*NxtWave Communications, Langhome, PA 19047, USA.
ca226@eng.cam.ac.uk - ad2@eng.cam.ac.uk - atouzni@nxtwavecomm.com
ABSTRACT
This paper presents an adaptive multi-user maximum a pos¬
teriori (MAP) decoder for synchronous code division mul¬
tiple access (CDMA) signals on fading channels. The key
idea is to interpret this problem as an optimal filtering prob¬
lem. An efficient particle filtering method is then developed
to solve this complex estimation problem. Simulation re¬
sults demonstrate the efficiency of our method.
1 Introduction
Code division multiple access (CDMA) systems have re¬
ceived much attention in recent years [13]. For the case of a
known channel with additive Gaussian noise, the maximum
likelihood (ML) optimal receiver was presented by Verdu
[16]. Lower-complexity linear receivers have also been pre¬
sented in this case. In the presence of unknown fading
channels, the estimation problem to be solved is much more
complex. MMSE linear receivers have also been presented
in this context . However it turns out that the rate of adap¬
tation for these linear techniques is not sufficient to track
fast-fading channels and more sophisticated approaches are
required. Recently, more efficient methods have been pro¬
posed; see for example [5], [6] where coupled estimators
combining a Viterbi algorithm and an MMSE predictor are
presented.
In this paper we follow a Bayesian probabilistic approach.
A state-space model is included to model explicitly the non¬
stationary of the fading channel. This allows us to formulate
the problem of estimating a posteriori symbol probabilities
as a complex optimal filtering problem. Under assumptions
detailed later on, it is well known that exact computation of
these probabilities involves a prohibitive computational cost
exponential in the (growing) number of observations. Thus
one needs to perform some approximations.
We present here a simulation-based method for solving
this problem. This so-called particle filtering method can be
viewed as a randomized adaptive grid approximation of the
posterior distribution. As will be shown later, the particles
C. Andrieu is sponsored by AT&T Laboratories, Cambridge UK.
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
(values of the grid) evolve randomly in time according to a
simulation-based rule. The weights of the particles are up¬
dated according to Bayes’ rule. The most striking advantage
of these MC particle filters is that the rate of convergence of
the error towards zero is independent of the state dimension.
That is, the randomization implicit in the particle filter gets
around the curse of dimensionality. Taking advantage of
the increase of computational power and the availability of
parallel computers, several authors have recently proposed
such particle methods following the seminal paper of Gor¬
don et al. [11], see [7], [8] for a summary of the state-of-
the-art and [2], [14], [15] for other applications in digital
communications. It has been shown that these methods out¬
perform the standard suboptimal methods.
We propose in this paper an improved particle method
where the filtering distribution of interest is approximated
by a Gaussian mixture of a large number, say N, of compo¬
nents which evolve stochastically over time and are driven
by the observations. Though it is rather computationally in¬
tensive, it can be easily implemented on parallel processors.
The rest of the paper is organized as follows. In Section
2, we state the model and the estimation objectives. In Sec¬
tion 3, we describe particle filtering methods. Finally we
demonstrate the efficiency of our algorithm in Section 4.
2 System Model and Estimation Objectives
2.1 System model
We follow here the presentation in [5], [6]. Consider a
synchronous CDMA system with a single-antenna at the
centralized receiver. The system has M users, each trans¬
mitting using a know direct sequence (DS) spreading code
with processing gain G (i.e. G chips per symbol). For user
to, the spreading code is represented by the G x 1 vector
sm = [sm,o, • ■ • , sm,G-i]T- At time t, user to transmits a
symbol xmit of period T = GTC, where Tc is the chip inter¬
val. Each chip sm,cxm;t is affected by the flat-fading chan¬
nel fm,k , represented at the chip rate where k = Gt + c.
Note that t is used as an index at the symbol rate, and k is
used as an index at the chip rate.
6
At the receiver, the incoming signal is sampled at the
chip rate to obtain zu. Assuming a synchronous system, the
received samples are given by
M
Zk = 'y ^ ^m, [k/G\ ^m,mod(k.G)fm}k "b
m= 1
for k = 0, ... , GT — 1. In vector-matrix notation
M
^ ^ Wj, (1)
m=l
for t = 0,. . . , T — 1, where Sm = diag (sm),
zt — [ *Gt , ■■■, 2G(i+l)-l]T wn — [u>Gt, ■ • • , ^G(*+l)— l]T
is a vector of zero mean i.i.d. complex Gaussian noise sam¬
ples with variance = |E [tUfctujj:] = N0/ (2TC). We
assume that the fading channels fm,t satisfy the following
state-space models
= Afmij_i -(- Bvm,( (2)
where fTOio is assumed distributed according to a Gaussian
distribution and the disturbance noise vmit is assumed zero
mean i.i.d. Gaussian. We denote ft = [fxit, . . . , f m,t]- The
initial states fm,o, the sequences vm,t and the observation
noise wt are all assumed mutually independent at any time
t. Finally, we assume that the symbols xt are modeled as
a first-order (finite state-space) Markov chain. The finite
state-space of the symbols is denoted by X.
2.2 Estimation objectives
Given the observations zo-t — (zo> • • • » Zt), all Bayesian in¬
ference on x0;t = (x0, . . . , xt ) and f0:t = (fo, . . . , ft) is
based on the posterior distribution p ( xo :t, fo:t| zo:t)- Here
the channel coefficients ft are regarded as nuisance param¬
eters and integrated out.
Our aim is to compute recursively in time t the MMAP
symbol estimate defined as
yMMAP =arg max p(Xt| Zo.t)
xt
The joint distribution p ( xo;t | zo:t) satisfies the following re¬
cursion
p(x0:t+l|Z0:t+l) = p(x0:t|z0:t)
xP(Zt+l|ZQ:t,X0;t+i)p(xf+i|Xt)
P ( Zf-f-1 j Z0:t , X0:f )
The likelihood term p (zt+i | z0:i, x0:t+i) can be evaluated
pointwise through the Kalman filter associated to the path
xo:t+i as the system (l)-(2) is linear Gaussian conditional
upon zo:t. It is easily seen that, given our assumptions, com¬
puting p ( Xo:t | zq:( ) orp (xt| zo;t) requires a computational
cost exponential in the (growing) number t of observations.
It is thus necessary to develop an approximation scheme.
Efficient batch algorithms have been developed to solve
related estimation problems [9] but they are of limited inter¬
est in a digital communications framework. Several “classi¬
cal” suboptimal algorithms have also been proposed to solve
related problems in the literature, see for example [1] for a
standard textbook on the subject. However, these approx¬
imation methods are notoriously unreliable and faults are
difficult to diagnose on-line.
3 Particle Filtering
In this paper, we present an original particle filtering method
to solve this optimal estimation problem .
3.1 Perfect Monte Carlo sampling
Assume it is possible to sample N i.i.d. samples, called par¬
ticles, {xq*J : i = 1, . . . , N} according to the joint distri¬
bution p (xo:t| yi:t), then an empirical distribution approx¬
imation of p ( x0:t | yi;t) is given by
1 N
PN (x0;t| Z0:t) = (X0:t) •
i— 1
Consequently an approximation of its marginal p (xt| zo:t)
is given by
1 N
PN (Xt| Zo:t) = JjJ25xli) (X«)
2=1
that is, for any i e X,
1
pN ( xt = i\ z 0:t) = — ^2 <*x<o (*) (3)
i—1
and
-MMAP =argma xpN (xt| Z0:t)
xtex
The estimate (3) is unbiased and from the strong law of large
numbers (SLLN), pN (xt = i\ z0;t) —> p(xt = i\ z0:t) al¬
most surely as N — > +oo. A central limit theorem (CLT)
holds too. The main advantage of Monte Carlo methods
over other numerical integration methods is that the rate of
convergence of p jv (xt = i\ z0:t) towards p ( x4 = i\ z0:() is
independent of the dimension t. Unfortunately, it is not pos¬
sible to sample directly from the distribution p (x0:t| z0:t) at
any t, and alternative strategies need to be investigated.
3.2 Sequential Bayesian Importance Sampling
An alternative solution to estimate p (xo;t| zo:t) consists of
using the importance sampling method. Suppose that N
i.i.d. samples {x^ : i = 1, . . . , N} can be easily simulated
according to an arbitrary importance distribution 7r( xo:t | zq -t),
7
such thatp(xo:f| zo: t) > 0 implies 7r(xo;t| z0:t) > 0. Using
this distribution a Monte Carlo estimate of p (xt| zo:t) may
be obtained as
Pn (xt = i\ 7.0, t) = J2i= i 4!^xf) (*) . (4)
where Wg’j oc w;(xq*|) (£^j w^t = 1), is the normalised
version of the importance weight tu(x^) defined as
w(x0:t) X
P(X0:t
z0:t)
z 0:t)
According to the SLLN, pjv (xt = i\zo,t) converges almost
surely towards p (xt = i\ zo,t) as N — > +oo, and under ad¬
ditional assumptions a CLT also holds.
The method described up to now is a batch method.
In order to obtain the estimate of p(x0;t| z0:t) sequentially,
one should be able to propagate this estimate in time with¬
out modifying subsequently the past simulated trajectories
{xqZj : i = 1, . . . , N}. This means that 7r(xo;i| zo ,t) should
admit 7r(x0;t-i| zo:t-i) as marginal distribution:
7r(xo-.t| Z 0,t) = 7r(x0:t_l| Z0:t_l)7r(xt| Z0:*,X0:t-l),
and the importance weights w(x0:t) can then be evaluated
recursively, i.e.
w(x0:t) = iu(x0:t_i) x wt, (5)
where
p(xt|z0:t,x0:t_i)
m = — i - r-
7r(xt|zo;t,x0:f_i)
There are an unlimited number of choices for the impor¬
tance distribution 7r (xo,t| zo:t), the only restriction being
that its support includes that of p (x0:t| zo:t)- A sensible
selection criterion is to choose a proposal that minimises
the variance of the importance weights given x0;t-i and
zo;t- The importance distribution that satisfies this condi¬
tion is 7r(xt| z0:t,x0:t-i) = p(xt|z0:t,x0:t-i), and this
“optimal” importance distribution is employed throughout
the paper (see [7] for details).
3.3 Selection step
For importance distributions of the form specified by (5)
the variance of the importance weights can only increase
(stochastically) over time [7]. It is thus impossible to avoid
a degeneracy phenomenon. Practically, after a few itera¬
tions of the algorithm, all but one of the normalised im¬
portance weights are very close to zero, and a large com¬
putational effort is devoted to updating trajectories whose
contribution to the final estimate is almost zero. To avoid
this, it is of crucial importance to include a selection step
in the algorithm, the purpose of which is to discard particles
with low normalised importance weights and multiply those
with high normalised importance weights. The weights of
the “surviving” particles are reset to 1 /N. A selection pro¬
cedure associates with each particle, say Xg*t, i — 1, . . . , N,
a number of children Ni <E N, such that AT* = N, to
obtain N new particles {x[,2:j : i — 1, . . . , N}. If Ni = 0
then Xq’;{ is discarded, otherwise it has Nt children at time
t + 1. In this paper, the selection step is done according
to a stratified sampling scheme [12], though other methods
such as sampling importance resampling (SIR) [11] may be
employed. The stratified sampling scheme proceeds as fol¬
lows: generate N points equally spaced in the interval [0,1],
and associate for each particle i, a number of children Ni
equal to the number of points lying between the partial sums
of weights <7i_i and qt, where qt = =
\^N ~(i)
|£i=i w:
i-i
w(t3)). This algorithm is such that E [Ni]
Nw^ and var[Ni] = jiVto^ j ^1 — jiVto^ where.
for any a, [aj is the integer part of a and {a} = a - [aj .
3.3.1 Algorithm
Given at time t — 1, N 6 N* random samples Xq^_j (i =
1 ,...,JV) distributed according to p(x0;t-i| z0:t_i), the
MC filter proceeds as follows at time t.
Particle Filtering Algorithm
Sequential Importance Sampling step
• Fori = 1,...,JV, sample x[8) ~ 7r(xt| z0:* , x^t_j)
and x§!{ = (x&l.i.xj0).
• For i = 1, ..., N, evaluate the importance weights
up to a normalising constant:
(i) Pl
(zil z0:t-i,:
K0:t J
IpI
(xl«
S&)
wt oc —
Zo :t
yW \
and normalise them ui\l) oc w[l\ J2j=i wtf> = 1-
Selection step
• Multiply/Discard particles (x£t5 * = 1> • • ■ > N) with
respect to high/low normalised importance weights
w[l) to obtain N particles (x£J; i = 1, . . . , N^.
Clearly, the computational complexity of the proposed
algorithm at each iteration is O (N). Moreover, since the
optimal and prior importance distributions 7r( x 1 1 zo-.t , xo:«- 1 )
and the associated importance weights depend on xo:t-i via
8
a set of low-dimensional sufficient statistics, only these val¬
ues need to be kept in memory and, thus, the storage re¬
quirements for the proposed algorithm are also O ( N ) and
do not increase over time.
3.3.2 Convergence Results
The following proposition is a straightforward consequence
of Theorem 1 in [4], which itself is an extension of results
in [3].
Proposition 1 For all t > 0, there exists ct independent of
N such that
E
(Pn (xt = i|z0:t) -p(xt = i|z0:t))
The expectation operator is with respect to the randomness
introduced in the particle filtering method. Though the par¬
ticles are interacting, one observes that one keeps the “stan¬
dard” rate of convergence of Monte Carlo methods.
4 Simulation Results
We demonstrate the performance of our multi-user MAP de¬
coder for transmission of binary-shift-keyed (BPSK) sym¬
bols over fast fading CDMA channels. The simulation pa¬
rameters were as follows: M = 3, G = 10 and a flat fading
channel with fading rate 0.05/T. We compared our results
with [6] and the case where the channel is assumed known
exactly. The results in terms of Bit Error Rate (BER) are
presented in Fig. 1. We notice that when the SNR is large,
our stochastic algorithm outperforms substantially that of
[6]. Their deterministic algorithm can indeed get trapped in
severe local maxima as the posterior distribution is peakier.
Figure 1: Dotted line + (channel known), solid line (particle
filtering), dotted line x ([6])
[3] D. Crisan, P. Del Moral and T. Lyons, “Discrete filtering
using branching and interacting particle systems”, Markov
Processes and Related Fields, vol. 5, no. 3, pp. 293-318,
1999.
[4] D. Crisan and A. Doucet, “Convergence of generalized par¬
ticle filters”, technical report, Cambridge University, TR-F-
INFENG TR 381, 2000.
[5] L.M. Davis and I.B. Collings, “Joint MAP detection and
channel estimation for CDMA over frequency-selective fad¬
ing channels”, in Proc. ISPACS-98, pp. 432-436, 1998.
[6] L.M. Davis and I.B. Collings, “Multi-user MAP decoding
for flat-fading CDMA channels”, in Proc. Conf. DSPCS-99,
pp. 79-86, 1999.
[7] A. Doucet, J.F.G. de Freitas and N.J. Gordon (eds.), Se¬
quential Monte Carlo Methods in Practice, Springer- Verlag:
New- York, 2000.
[8] A. Doucet, S.J. Godsill and C. Andrieu, “On sequential
Monte Carlo sampling methods for Bayesian filtering”,
Statistics and Computing, vol. 10, no. 3, pp. 197-208, 2000.
[9] A. Doucet, A. Logothetis and V. Krishnamurthy, “Stochas¬
tic sampling algorithms for state estimation of jump Markov
linear systems”, IEEE Trans. Automatic Control, vol. 45, no.
2, pp. 188-201,2000.
[10] A. Doucet, N.J. Gordon and V. Krishnamurthy, “Particle fil¬
ters for state estimation of jump Markov linear systems”,
technical report, Cambridge University, TR-F-INFENG TR
359, 1999.
[11] N.J. Gordon, D.J. Salmond and A.F.M. Smith, “Novel ap¬
proach to nonlinear/non-Gaussian Bayesian state estima¬
tion”, IEE Proceedings-F, vol. 140, no. 2, pp. 107-113, 1993.
[12] G. Kitagawa, “Monte Carlo Filter and Smoother for Non-
Gaussian Nonlinear State Space Models”, J. Comp. Graph.
Stat., vol. 5, no. 1, pp. 1-25, 1996.
[13] U. Madhow, “Blind adaptive interference suppression for
direct-sequence CDMA”, Proceedings of the IEEE, pp.
2049-2069, 1998.
[14] E. Punskaya, C. Andrieu, A. Doucet and W.J. Fitzgerald,
“Particle filters for demodulation of M-ary modulated sig¬
nals in noisy fading communication channels”, in Proceed¬
ings Conf. ICASSP 2000.
[15] E. Punskaya, C. Andrieu, A. Doucet and W.J. Fitzgerald,
“Particle filtering for demodulation in fading channels”,
technical report Cambridge University CUED-F-INFENG
TR 381, 2000.
[16] S. Verdu, “Minimum probability of error for asynchronous
Gaussian multiple access channels”, IEEE Trans. Informa¬
tion Theory, vol. 32, no. 1 , pp. 85-96, 1986.
5 REFERENCES
[1] B.D.O. Anderson and J.B. Moore, Optimal Filtering,
Prentice-Hall, Englewood Cliffs, 1979.
[2] C. Andrieu, A. Doucet and E. Punskaya, “Sequential Monte
Carlo methods for optimal filtering”, in [7],
9
ANALYSIS OF A SUBSPACE CHANNEL ESTIMATION TECHNIQUE FOR
MULTICARRIER CDMA SYSTEMS
Carlos J. Escudero, Daniel I. Iglesia, Monica F. Bugallo, Luis Castedo
Departamento de Electronica y Sistemas. Universidad de La Coruna
Campus de Elvina s/n, 15.071 La Coruna, SPAIN
Tel: ++ 34-981-167150, e-mail: escudero@des.fi.udc.es
ABSTRACT
In this paper we investigate a blind channel estimation
method for Multi-Carrier CDMA systems that uses
a subspace decomposition technique. This technique
exploits the orthogonality property between the noise
subspace and the received user codes to obtain a chan¬
nel identification algorithm. In order to analyze the
performance of this algorithm, we derived a theoretical
expression of the estimation MSE using a perturbation
approach. This expression is compared with the numer¬
ical results of some computer simulations to illustrate
the validity of the analysis.
1. INTRODUCTION
Multi-Carrier (MC) transmission methods for Code Di¬
vision Multiple Access (CDMA) communication sys¬
tems have been recently proposed as an efficient tech¬
nique to combat multipath propagation and have gained
an increased interest during the last years [1, 2]. In
these techniques each user is assigned to a unique iden¬
tification code sequence and the transmitted signal is
split in different subcarriers. It is assumed that the
subcarrier bandwidth is smaller than the channel co¬
herence bandwidth and, therefore, presents only flat
fading. As a consequence, MC-CDMA systems do not
suffer from Inter-Symbol Interference (ISI). However,
the effects of dispersive channels appear as random dis¬
tortions in the amplitude and phase of each subcarrier.
This causes a loss of orthogonality between user codes
and introduces Multiple Access Interference (MAI).
In order to implement a multiuser detector and to
reduce MAI it is necessary to characterize, implicitly or
explicitly, the channel parameters. In this paper we in¬
troduce a new blind channel estimation technique that
is based on a subspace decomposition [3] and derive
a particular algorithm to identify the channel parame¬
ters. We also obtain, using perturbation techniques, an
This work has been supported by FEDER (grant 1FD97-
0082).
approximate expression of the estimation Mean Square
Error (MSE) achieved with the proposed algorithm.
The paper is organized as follows. Section 2 presents
the signal model of a synchronous MC-CDMA system.
Section 3 describes the subspace decomposition tech¬
nique and the resultant algorithm. In section 4 we per¬
form the theoretical analysis of the estimation MSE.
Section 5 shows the results of several computer simula¬
tions that illustrate the validity of the approximations
in the previous section and, finally, Section 6 is devoted
to the conclusions.
2. SIGNAL MODEL
Let us consider a discrete-time baseband equivalent
model of a synchronous MC-CDMA system with N
users using L-chip signature codes. The fc-tli chip cor¬
responding to the n-th symbol transmitted by the i-th
user is given by
Figure 1: Block diagram of the discrete-time baseband
model of a MC-CDMA system.
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
10
3. SUBSPACE DECOMPOSITION
«£(*) = «»*(*) fc = 0, • • • , L — 1 ra = 0, 1,2, • • • (1)
where c,(fc) is the fc-th chip of the i-th user code. In
a MC-CDMA system the modulator computes the L-
IDFT (Inverse Discrete Fourier Transform) of (1) to
obtain the following multicarrier signal
1 i_1
V‘(m) = IDFT[vi(k )] = - £ vl{k)e^km (2)
^ k—0
This signal is transmitted through a dispersive channel
with an impulse response hj(m); m = 0 ,...M — 1. At
the receiver the observed signal is a superposition of
the signals corresponding to N users plus an additive
white Gaussian noise (AWGN). Therefore, the received
signal for the n-th symbol is the following
N
Xn(m) = ^2 Vn(m) * hi(m) + rn{m ) (3)
i= 1
where * denotes discrete convolution and rn(m ) repre¬
sents a white noise sequence.
To recover the transmitted symbols, the receiver
applies a L-DFT (Discrete Fourier Transform) to the
received signal (3). Assuming perfect synchronization
and a sufficiently large guard time between symbols,
the resultant signal is
N
xn(k) = DFT[Xn(m)} = J2<(k)Hi(k) + » »(*) (4)
i— 1
N
= ^2slnCi{k)Hi(k) + , n(k) k = 0, - ■ ■ ,L — 1
i= 1
where Hi(k) and , n(k) are the DFT’s of /i,(m) and
rn(m), respectively. Rewriting (4) in vector notation
we obtain
N
= [£n(0), • ■ • ,Xn(L — 1)]T = ^2 snCjH, + r„
i— 1
N N
= ^2 slCiFhi + rn = 8% + rn (5)
i— 1 i= 1
where T denotes transposition, C, is a diagonal matrix
whose elements are the L chips of the code correspond¬
ing to the i-th user, H, = [/?,( 0), ■ • • ,Hi(L — 1)]T and
Tn = [, n(0), • • • , , n(L - 1)]T. To obtain (5) we have
used the relationship Hj = Fh, where F is a L x M
DFT matrix and h, = [h,(0), ■ • ■ , h,(M - 1)]T. Note
that (5) is a CDMA signal where the code associated
to the *-th user is c* = CjFh,.
Assuming statistical independence between users and
noise, the autocorrelation matrix of the observations
vector (5) can be decomposed as
N
R = E[xnx^] = ^ciE[44*]cf + E[T„r^]
i— 1
= Y2 (6)
i— 1
where £?[■] is the expectation operator, * represents con¬
jugate, H denotes conjugate transpose, I is the identity
matrix and of and rilT are the i-th user signal and noise
power, respectively.
Let us consider the eigendecomposition of (6). There
are L eigenvalues that we sort as Ao > \\ > • • • >
\l-i- It is well-known that the eigenvectors associ¬
ated to the N most significants eigenvalues (u;, l =
0, ■ • • , N — 1) span the signal subspace where the per¬
turbed user codes, Cj, lie. The remaining L — N eigen¬
vectors (u;, l = N, • ■ • , L — 1) span the noise (orthogo¬
nal) subspace and their associated eigenvalues are equal
to the noise power, i.e., XN = ■ •■ = \l-\ = <4 [3].
As we have seen, the perturbed user codes lie in the
signal subspace and are orthogonal to the noise sub¬
space. This property can be used to state the following
system of equations for the i-th user
cfu, = 0 l — N,- • • ,L — 1 (7)
Recall that this system of equations has M unknowns
and L — N equations. It will be solvable if and only
if the number of equations is greater or equal than the
number of unknowns, M < L — N. This means that
the number of simultaneous users, N, is limited by the
number of carriers, L, and the channel length, M. Nev¬
ertheless, it is interesting to note that the system ca¬
pacity can be increased without increasing the number
of carriers using codes with a length larger than the
spreading gain [4].
In order to solve the equations system (7), we can
consider the following equivalent system
||cf uH|2 = cfu/ufci = hf FwCf u^u/^CiFh 4 = 0 (8)
for l = N,-- ■ ,L — 1. The solution to these equations
can be found by solving the following minimization
problem
~L—1
hi = arg min V hf F^Cf u;uf CiFh;
Mhd|2=i
11
= are min h(
H
X- 1
^2 FHCfulufICiF
U=N
= arg min hf [FHCf UUffCiF] h;
l|h,N2=l L J
= arg min hfQjh;
IIMI*=i
(9)
where the solution h, is an estimation of the chan¬
nel impulse response vector, U is a L x (L - N) ma¬
trix whose columns are the eigenvectors associated to
the noise subspace (i.e., u/, l = N, • • • ,L — 1) and
Q i — FHC^\JJJHCiF. The solution can be obtained
by the least squares method and it corresponds to the
eigenvector of Q, associated to its minimum eigenvalue
[51-
In practice, we do not know a priori the autocorre¬
lation matrix (6). However, it can be estimated from
the sampled matrix as
A = ^-Sx«xn (10)
8 n=l
where Ns is the number of received symbols used to
obtain the estimation. Note that R -> R as Ns tends
to infinity and also its eigenvalues A; — » A; and eigen¬
vectors U( -> U;.
Finally, when using second order statistics, the chan¬
nel impulse response can be obtained up to a complex
constant. This constant has to be compensated in or¬
der to analyze the algorithm performance. Towards
this aim, we normalize the estimation of the impulse
response vector as h i, normalized = where hi(0)
and hi( 0) are the first elements of the true and esti¬
mated channel impulse response vectors, respectively.
where we have neglected the second order term, AQAh ~
0. Therefore,
QAh ~ -AQh (12)
and
Ah ~ -QfAQh
= -Qf(Q-Q)h
= -QtQh (13)
where Ql denotes the left pseudo-inverse of Q. The
fc-th component of Ah is given by
A h(k) ~ -qfQh
= -qf(F"CflrUU//CF)h
x-i
= -^qfFKCHu(ufCFh
l=N
L-l
= -^ufCFhqfFHCHu(
l=N
= -TVacelUU^CFhqfF^C^} (14)
where qfc is the fc-th column of (Q^)H ■ Based on the
results of [6] (page 1840, equation (4.11)), we obtain
the following identity
UU* CFh ~ -UUHAVVff CFh (15)
where V is a L x N matrix whose columns are the eigen¬
vectors associated to the signal subspace (i.e., u; l =
0, • • ■ , N - 1) and AV = V - V . Moreover, from
Appendix A of [6] (page 1844, equation {A. 2))
VH AV ~ UhRVA_1 (16)
4. MEAN SQUARE ERROR ANALYSIS
In this section, we derive an analytical expression of the
estimation MSE. For simplicity reasons, let us denote
hj = h, Qi = Q and C, = C. Our analysis is based on
a perturbation technique [7] that allows us to express
the perturbation in h, Ah, in terms of the perturbation
in Q, AQ. Let us consider the following identities
Qh = 0
h = h + Ah (11)
Q = Q + AQ
For a sufficiently large number of samples ( Ns — > oo),
Q — > Q, h — > h and Qh is approximately equal to the
zero vector, i.e.
where A = diag(Ao — o> , • • • , Ajv-i - of) where diag(a)
is a diagonal matrix whose elements are the elements of
vector a. To remove the effect of the unknown constant
that we have in the estimation of the channel vector,
we have to consider a normalization of the vector chan¬
nel estimate. Similarly to [7], we select the following
normalization
Ah normalized — (I '
hlT
m
)Ah
(17)
where I is the identity matrix and 1T = [1,0,0, •••].
This normalization can be included in (13) and now q*.
will be the fc-th column of the matrix ((I —
Combining (15) and (16) in (14), we obtain the follow¬
ing expression
Qh = (Q + AQ)(h + Ah) ~ AQh + QAh«0 A h(k) ~ TVacefUU^RVA-1^ V^CFhqf FHCW}
12
L- 1
= ^(ufRVA-^^CFhqfF^C^u/)
l—N
L- 1
= ^ufRg,* (18)
l=N
Figure 3 shows the simulated and theoretical MSE
versus the Signal to Noise Ratio (SNR) of the received
users. The environment is the same as before and the
curves are obtained after Ns = 200 symbols. We can
see that both curves are very similar even for small
values of SNR.
where glk = VA^1 Vf/CFhqf F^C^u,.
Finally, to obtain the MSE of the channel estima¬
tion algorithm, we have to explore the fourth order
statistics of binary and Gaussian random variables. In
appendix A it is demonstrated that
£[||Ah||2] = (19)
2 M—l
= y; (TVace{UffUGf CCHGk}
8 fc=0
+<72T>ace{UffUGfG*})
where G* = [gArfc, • • • , g(i,— i)*] and C = [<7iCi, • • • .ctjvCjv].
5. SIMULATIONS
In this section we compare the analytical expression
(19) with the MSE obtained from computer simulations
of the algorithm (9) to illustrate the validity of the
approximation carried out in the previous section.
Figure 2 examines the accuracy of the MSE analy¬
sis. It is shown the time evolution for theoretical and
simulated MSE (averaged value of 50 realizations). An
environment with L = 12 carriers, a channel length
M = 4 and 8 users received with a SNR = 12 dB was
considered. It can be seen that even for a small num¬
ber of symbols, the theoretical expression fits to the
simulated MSE.
Figure 3: Simulated and theoretical MSE vs. received
users SNR.
6. CONCLUSIONS
A new blind channel identification method for Multi-
Carrier CDMA systems has been presented. The method
exploits the orthogonality between the signal and noise
subspaces of the incoming signal. It also has been inves¬
tigated the performance of the method: using a pertur¬
bation technique, we derived an analytical approximate
expression of the estimation MSE. Computer simula¬
tions have revealed the high accuracy of the analytical
approximation carried out.
A. APPENDIX
Taking into account that cf*u ; = = 0, it is
straightforward to obtain from (18) that
A h(k) =
1 L-1N.-1 (N \
= rn(4)*cf + r„r"
8 l—N n=0 \i= 1 /
(20)
g Ik
where * represents conjugate. Therefore, the MSE is
Figure 2: Time evolution of the simulated and theoret¬
ical MSE.
M—l
£[||Ah||2]= E[Ah(k)Ah* {k)] =
k= 0
(21)
13
M— 1 / 1 £— 1 £ — 1 Ns — 1 Nt—1 N N
= E ^EEEEEE
k— 0 \ s l=N p=N n— 0 m=0 i= 1 j=l
E[uffr„r"upgf<.cis^(4)*cfgjfe]
1 £-1 i-1 JV.-l /Va-1
+ 4 E E E E £I<r„rJgltg»r„r"
s /=JV p—N n—0 m—0
where we have used the fact that the third order mo¬
ments of a Gaussian random variable are zero.
Considering statistical independence between users
and noise and the user symbols i.i.d., the first expecta¬
tion (21) is
^[uffrnr^upg^cisjn(4)*cf gifc] =
= afaZu^Upg^CiC? glk6(n - m)5(i - j) (22)
where £(•) is the Kronecker function.
The second expectation in (21) can be expressed as
e [ufrnr"g/*g$rmr"up] =
= XI? E{YnY”]gikg*kE[TmY"]up
+ufr£:[rnr"]upg^.E[rmr^]gife
= 4uf ^PSpkSikS(n - m) (23)
where we have used the facts u^gu- = 0 and E[0 [OT^^OX] =
£[0!02*]£;[03^] + Eie^WEie^} when i = 1,2, 3, 4
are four independent Gaussian variables [7].
Including (22) and (23) in (21), it is obtained
Identification of Multichannel FIR Filters” , IEEE
Transactions on Signal Processing, vol. 43, no. 2,
pp. 516-525, February 1995.
[4] D. I. Iglesia, C. J. Escudero, L. Castedo, “A Sub-
) space Method for Blind Channel Identification in
Multi-Carrier CDMA Systems”, Second Interna¬
tional Workshop on Multi- Carrier Spread Spec¬
trum & Related Topics (MCSS’99), Kluwer Aca¬
demic Publishers, September 1999.
[5] G. Strang, Linear Algebra and its Applications,
Harcourt Brace Jovanovich, Third Edition, 1988.
[6] P, Stoica and T. Soderstrom, “Statistical Analy¬
sis and Subspace Rotation Estimates of Sinusoidal
Frequencies”, IEEE Transactions on Signal Pro¬
cessing, vol. 39, no. 8, pp. 1836-1847, August 1991.
[7] W. Qiu, Y. Hua, “Performance Analysis of the
Subspace Method for Blind Channel Identifica¬
tion”, Signal Processing, no. 50, pp. 71-81, 1996.
£[||Ah||2] =
n M—l L—l L- 1 Ng N
k= 0 l=N p=N n=l i= 1
+<4U; upgp/fcg;fc) (24)
that is equivalent to (19).
REFERENCES
[1] K. Fazel, G. P. Fettweis, Multi-Carrier Spread-
Spectrum, Kluwer Academic Publishers, 1997.
[2] N. Yee, J. P. Linnartz, G. Fettweis, ’’Multi-
Carrier CDMA in Indoor Wireless Radio Net¬
works”, Proc. International Symposium on Per¬
sonal, Indoor and Mobile Radio Communications
(PIMRC93), Yokohama, pp. 109-113, 1993.
[3] E. Moulines, P. Duhamel, J. F. Cardoso and
S. Mayrargue, “Subspace Methods for the Blind
14
BLIND ADAPTIVE ASYNCHRONOUS CDMA
MULTIUSER DETECTOR USING PREDICTION
LEAST MEAN KURTOSIS ALGORITHM
Kunjie Wang and Yeheskel Bar-Ness
Center for Communications and Signal Processing Research
Department of Electrical and Computer Engineering
New Jersey Institute of Technology
University Heights, Newark, NJ 07102, USA
Tel: 1-973-596-3520 Fax: 1-973-596-8473
Email: wangk@njit.edu Cc: barness@njit.edu
ABSTRACT
In this paper, a new blind adaptive multiuser detector,
which is termed prediction least mean kurtosis (PLMK)
algorithm, is proposed for joint MAI and narrowband
interference (NBI) suppression in asynchronous CDMA
systems. This algorithm is based on a higher-order
statistics rather than the second-order statistics used in the
LMS algorithm. Unlike the regular least mean kurtosis
(LMK), it takes into consideration samples earlier than
those correspond to current bit. For comparison purposes,
we also apply the regular LMK algorithm to the case of
asynchronous CDMA systems. Simulation results show
that the blind adaptive multiuser detector with PLMK
algorithm provides significantly better performance than
the one with regular LMK algorithm.
1. INTRODUCTION
Blind adaptive multiuser detector has received significant
attention due to its implementation without requiring
training sequences in CDMA systems. During the past
several years, many researches in this area have focused
their effort on the least mean square (LMS) algorithm due
to its low complexity. To achieve better performance in
suppressing multiple-access interference (MAI) in
synchronous CDMS systems, Tang, et al [3](l) applied
instead the least mean kurtosis (LMK) algorithm. The
LMK algorithm is based on a higher-order statistics
rather than the second-order statistics used in the LMS
algorithm.
In this paper, a new blind adaptive multiuser detector
This research was partially supported by New Jersey
Center for Wireless Telecommunications.
(l)Note that in [3] only synchronous case was considered.
termed prediction least mean kurtosis (PLMK) algorithm,
is proposed for joint MAI and narrowband interference
(NBI) suppression in asynchronous CDMA systems.
Unlike the regular LMK, it takes into consideration
samples earlier than those correspond to current bit. For
comparison purposes, we also apply the regular LMK
algorithm of [3] to the case of asynchronous CDMA
systems. Simulation results show that the blind adaptive
multiuser detector with PLMK algorithm provides
significantly better performance than the one with LMK
algorithm.
2. SYSTEM MODEL
We consider the low-pass equivalent model of an
asynchronous CDMA system. The received signal due to
the kth user is given by
n (0 = £ sk (t - iT - ** ) (1)
where T is the bit interval, bk e {- 1,1} is the information
data of the £th user. Pk and Tk denote the power and
relative delay of the fcth user, respectively. The spreading
waveform sk ( t ) is given by
«*(')= tak(n}//(t-nTc) (2)
n= 1
where ak (n)e {- 1,1} is the nth element of the spreading
sequence for the klh user, N is the processing gain and
Tc =T/N is the chip duration. y/(t) is a normalized
rectangular pulse of width Tc , i.e., \f/2(t)dt = 1 .
The total received signal can be written as
0-7803-5988-7/00/$ 1 0.00 © 2000 IEEE
15
r(t)='Zlrk(t) + i(t) + n(t) (3)
*= 1
where K is the number of users, i(t) is the NBI and n(t)
is the white Gaussian noise.
The received signal r(t ) is assumed to pass through a
chip-matched filter sampled at chip rate and synchronized
to chip time. The /th received signal sample at the output
of the chip-matched filter is
r(0 = l£'W'r(t)r(t-lT,)dt (4)
from which the /th NBI sample and the /th white
Gaussian noise sample at the output of the chip-matched
filter are / (/ ) = J ^+1 )?c i (/)//■(/ - ITC )dt and
«(/) = /;(/“)//(/ - ITC )dt respectively.
J it c
In this paper, we assume that the NBI is modeled as a
pth -order AR process, i.e.,
= j)+e(l) (5)
y=i
where e(l ) is a white Gaussian process with variance £ 2 .
3. BLIND PREDICTION LMK
ALGORITHM
Without loss of generality, we assume that the power and
the delay of the desired signal are, respectively, Px = 1
and T, = 0 , and convenience, we define Tk = dkTc where
dk is integer between 0 and N — 1. In [3], the LMK
algorithm is based on the received signal samples vector
rT = [ r(0),r(l), ,r(N -1)]. It is well known that the
current value of NBI is predictable from its past values.
Therefore, we expect better performance by extending the
received signal samples vector into the interval
[■ ~MTC,T ] (M> 0), i.e.,
rr ~[r{-M),r{-M +1), l),r(0),r(l), ,r(AI- 1)],
which is termed PLMK algorithm. We consider the case
of M < N in this paper. For a given relative delay vector
d = \d{ , .. ,dK ]r , we can obtain from (1)~(4)
r = yJFl(blal+b[a'l)
K - - (6)
+ IMai +bWk +bX)+i + n
k= 2
where for — M <1 < N — 1 and 2<k<K
ai (0 — \-a\ (0]^((20)
(7)
a\(l)=M + N)]xil<0)
(8)
at (0 = 0 ~ dk )\Xuik<i<N)
(9)
ak (!) ~[ok(l + N— dk )\X(-N+di<l<dk)
GO)
a* ( !) = \-ak 0 + 2 N —dk )]X(t<-N+dk)
(ID
with Xa *s indicator function for the set A, bk is the
current bit of the Ath user, b'k and bk is one bit or two
bits earlier than the current bit of the Ath user,
respectively.
From (6), we notice having 3(K -l) + 2 = (3K -1)
vectors {Jp and {[Fkak,4Fkak,4Fka”k\
k = 2, ,K . Depending on the relative delays of the
multiuser interferers, we have among these, L
( 2 K <L< 3 K — 1 ) non-zero vectors. For the L non-zero
vectors, we write Eqn.(6) in the form
r = I^P*+i + n (12)
*= i
where the non-zero vector p, is the desired signal vector
■y[t\, a, , and bx is the desired bit. The set of non-zero
vectors {p2, ,pz} consists of the intersymbol
interference (ISI) {Jf\a't } and the non-zero MAI vectors
of the set {[Fkak,4K*k’JPX\ k = 2, ,K.
{b2, ,bL} are data coefficients corresponding to the
vectors {p2, ,pL}, respectively. For example, b, = b[
if p, =4Fk*’k, 2<1<L, \<k<K.
We use the following cost function of [3] to suppress
interference without requiring training sequence:
/s(h) = 3[£(rrh)2f -£(rrh)4 (13)
Taking the gradient with respect to the vector h , we have
V/fi(h) = 12£(rrh)2£(rTh)r-4£(rTh)3r (14)
The mean value £(rrh)2 will be estimated specially by
recursive equation
16
G(n) = fiG(n - 1)+ (1 - /3)[r (nf h(«)]" (15)
with 0 < p < 1 is forgetting factor.
Using this eastimate and the ensamble estimate of
£(rTh); r(«)rh(«) , we can get the following equation
V/B [h(«)] = 4^>G(n) - [r(«)r h(«)J ]r(«)r h(«)r(«)
(16)
Then the stepest decent adaptive weight-update algorithm,
PLMK algorithm, can be characterized by
h(rt + l)=h(n)-i/t|V/Jh(«)]} (17)
with VJB [h(«)] from (16) and G(n) from (15). We can
see that training sequence is not needed, the PLMK
algorithm is blind.
4. SIMULATION RESULTS
Simulations results carried out to evaluate the
performance of the PLMK algorithm is depicted in Fig.l.
For comparison, we add to it the results with regular
LMK algorithm [3], but for asynchronous case, which can
be obtained from PLMK with M =0. In this simulation,
we use a three-user CDMA system employing Gold Code
of length 7. For calculating the averaged SIR at the nth
iteration, we use expression given by [2];
iihw'p.r
SIR(n) = - - — -
X{h(n)T[r(n)-foi(n)p1]}2
;=i
with J is the number of times the simulations are
repeated. Each of the other CDMA users has power P
larger than the desired CDMA user power Px = 1 . The
delay vector is set to d = [0,l,3,6]r . The NBI is modeled
as a first-order AR process with a, = 0.99 and power of
3dB higher than the desired signal. The white noise
power is set to 0.1. We use M - 3 , P = 10 , /3 = 0.4 ,
JU = 6xlO”4and 7=500. From Fig.l, we can easily
see that the PLMK algorithm provides significantly better
performance than the regular LMK algorithm with almost
the same convergence rate.
5. CONCLUSIONS
In this paper, we proposed a new blind adaptive
multiuser detector based on prediction least mean
kurtosis (PLMK) algorithm for joint suppressing MAI
and NBI in asynchronous CDMA systems. For
comparison, we also apply the regular LMK
algorithm of [3] to the case of asynchronous CDMA
systems. Results show that the blind adaptive
multiuser detector with PLMK algorithm provides
significantly better performance than the one with
regular LMK algorithm.
6. REFERENCES
[1] O. Tanrikulu and A.G. Constantinides, “Least-mean
kurtosis: A novel high-order statistics based adaptive
filtering algorithm”, 1EE Electron. Lett., vol.30, pp.
189-190, 1994.
[2] M. Honig, U. Madhow and S. Verdu, “Blind adaptive
multiuser detection”, IEEE Trans. Inform. Theory,
vol. IT-41, No. 4, pp. 944-960, July 1995.
[3] Z. Tang, Z. Yang and Y. Yao, “Blind multiuser
detector based on LMK criterion”, IEE Electron.
Lett., vol.35, pp. 267-268, 1999.
Nurrber of Iterations
Fig.l Averaged output SIR versus number of iterations
( N = 1,M =3,K = 3)
17
MMSE EQUALIZATION FOR FORWARD LINK IN 3G CDMA: SYMBOL-LEVEL
VERSUS CHIP-LEVEL *
Thomas P. Krauss, William J. Hillery, and Michael D. Zoltowski
School of Electrical Engineering, Purdue University
West Lafayette, IN 47907-1285
e-mail: krauss@purdue.edu, hilleryw@ecn.purdue.edu, mikedz@ecn.purdue.edu
ABSTRACT
We investigate a “symbol-level” MMSE equalizer for the
CDMA downlink over a frequency-selective multipath chan¬
nel meant to improve on the recently proposed “chip-level”
downlink equalizers. Indeed the symbol-level equalizer per¬
forms better than the chip-level, but is computationally more
demanding. The symbol-level equalizer is optimal for “sat¬
urated cells” where all Walsh-Hadamard channel codes are
in use and have equal power. It performs very close to op¬
timal even for relatively lightly loaded cells. We derive a
bound on the off-diagonals of the covariance matrix of the
transmitted data that helps explain why the equalizer works
when there are fewer active channel codes than the spread¬
ing factor. Performance is evaluated through simulations to
obtain the average bit error rate (BER) over a class of chan¬
nels for two cases: no out-of-cell interference, and one equal
power base-station. The symbol- and chip-level equalizers
are compared to the conventional RAKE receiver.
1. INTRODUCTION
Chip-level downlink equalization is a good candidate for im¬
proving capacity (in terms of users and/or data rate) in
3G cellular systems such as cdma2000 [1], These equaliz¬
ers significantly cancel multi-user access interference (MAI),
the main performance limitation for the standard RAKE re¬
ceiver. The good qualities of the recently proposed “chip-
level equalizers” for CDMA downlink are that they need
knowledge only of the desired user’s spreading code (and
long-code), they change only as often as the channel so don’t
need to be recomputed every symbol, and the same equalizer
applies to all users from a given base-station. However, these
equalizers do not yield the optimal estimate of the transmit¬
ted symbol.
The optimal equalizer is conditioned on all of the chan¬
nel codes in use and their powers, and also the base-station
dependent long code. Since these aren’t really random quan¬
tities, it should be possible to improve on the performance
by using them. One option approaching the optimal one,
but still having the nice feature of only needing to know the
channel code(s) of the desired user, is derived here. We re¬
fer to this as the “symbol-level” equalizer. This equalizer
changes every symbol, unlike the chip-level equalizer. We
find that this equalizer leads to a performance improvement
over the chip-level equalizer when all channel codes are in
use and are equal power (in which case the derived equal¬
izer is equal to the optimal symbol estimate). We also make
some arguments, and show simulation results, that show this
equalizer is applicable when there are fewer active channel
‘THIS RESEARCH WAS SUPPORTED BY THE TEXAS
INSTRUMENTS DSP UNIVERSITY RESEARCH PROGRAM
AND THE AIR FORCE OFFICE OF SCIENTIFIC RESEARCH
UNDER GRANT NO. F49620-00-1-0127.
codes per cell.
In this paper we derive the symbol-level MMSE estimator
for the two base-station case. One base-station transmits
the desired user’s data, while the other base-station is con¬
sidered interference. Spatial diversity and/or oversampling
with respect to the chip rate are handled as multiple chip¬
spaced channels. Our simulations assume spatial diversity is
provided by two antennas at the receiver which experience
independent fading, and oversample at twice the chip rate.
Some relevant papers on linear chip-level downlink equaliz¬
ers that restore orthogonality of the Walsh-Hadamard chan¬
nel codes and hence suppress MAI are [2, 3, 4, 5, 6, 7, 8], Of
these, [4, 7, 8] address antenna arrays, while the others con¬
sider a single antenna, possibly with oversampling. In Ref¬
erence [8] we compare one and two antenna receivers. The
interference from other base-stations is addressed in Ghauri
and Slock [4], Frank and Visotsky [3], and by Krauss and
Zoltowski in [7].
In this paper the channel and noise power are assumed
known (i.e., channel estimation error is neglected). Using
the exact channel in simulation and analysis leads to an in¬
formative upper bound on the performance of these meth¬
ods, but must be understood as such. For adaptive versions
of linear chip equalizers for CDMA downlink see [3] and [6]
and some of the references in [5]. [3, 4] present performance
analysis in the form of SINR expressions for the multiple
base-station case, for the chip-level equalizer. In [7] Krauss
and Zoltowski show that the SINR expression along with a
Gaussian assumption is a good predictor of uncoded BER
for BPSK symbols for the chip-level equalizers.
2. DATA AND CHANNEL MODEL
The impulse response for the i — th antenna channel, between
the kth base-station transmitter and the mobile-station re¬
ceiver, is
Na- 1
h\k\t) = [k]Prc(t -rk) i = 1, 2, k = 1, 2 (1)
*=o
prc(t) is the composite chip waveform (including both the
transmit and receive low-pass filters) which we assume has
a raised-cosine spectrum. Na is the total number of delayed
paths or “multipath arrivals,” some of which may have zero
or negligible power without loss of generality.
The channel we consider for this work consists of Na = 17
equally spaced paths 0.625 ns apart (to = 0, ri = 0.625ps,
. . .); this yields a delay spread of at most 10/is, which is an
upper bound for most channels encountered in urban cellular
systems. We model the class of channels with 4 equal-power
random coefficients with arrival times picked randomly from
the set {to, ri, . . . , Tie}; the rest of the coefficients [fc]
are zero. For base-station 1, once the 4 arrival times have
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
18
been picked at random and then sorted, the first and last
arrival times are forced to be at 0 and the maximum delay
spread of lOps respectively. Base-station 2’s arrival times are
chosen in the same fashion and independent of base-station
l’s, but without forcing arrivals at 0 and 10ps. The coef¬
ficients are equal-power, complex-normal random variables,
independent of each other. The arrival times at antennas 1
and 2 associated with a given base-station are the same, but
the coefficients are independent.
The “multi-user chip symbols” for base-station k , s^[n],
may be described as
base-station k, k = 1,2. The equalizer coefficients q*k) [n]
comprise the equalizer vector
g(fc)=[gr ■■■sTf (5)
where
g.(fc) - [g\k) [0], g(k) [1], • • • , 9{k) [N, - 1]]T i = 1, . . . , M. (6)
The MNg x 1 vectorized received signal is given by
y[n] = H(1)s(1) [n] + H(2)s(2) [n] + rj[n] (7)
N(uk) N.- 1
*(fc)N = c6»}N YI - Ncm] (2)
j= 1 m=0
where the various quantities are defined as follows: c[kJ [n] is
the base-station dependent long code; is the j,h user’s
gain; 6^[m] is the jth user’s bit/symbol sequence; cj*^[n],
n = 0,1,..., Nc — 1, is the jth user’s channel (short) code;
Nc is the length of each channel code (assumed the same for
each user); N ^ is the total number of active users; N, is the
number of bit/symbols transmitted during a given time win¬
dow. The signal received at the ith antenna (after convolving
with a matched filter impulse response having a square-root
raised cosine spectrum) from base-station k is
y(ik](t) = y^ sW[n]h\k)(t- nTc) * = 1,2 (3)
n
where h\k\t) is as defined in Eqn. (1). The total received
signal at the mobile-station is simply the sum of the contri¬
butions from the different base-stations plus noise:
yi(t) = y,w(t) + y\2)(t) + q,(t) * = 1, 2. (4)
ydt) is a noise process assumed white and gaussian prior to
coloration by the receiver chip-pulse matched filter.
For the first antenna, we oversample the signal yi (t) in
Eqn. (4) at twice the chip-rate to obtain yi[n] = yi (nTc)
and y2 [n] = yi (“ + nTc). These discrete-time signals have
corresponding impulse responses = fcj (<)!t=nTc and
[n] = hjfc^(t)|t_rc+nI, for base-stations k = 1,2.
For the second antenna, we also oversample the signal j/s <(t)
in Eqn. (4) at twice the chip-rate to obtain 2/3 [n] = 1/2 (nTc)
and </4 [n] = 2/2 + nTc). These discrete-time signals have
corresponding impulse responses [n] = h^k\t)\t=nTc and
h{k^ [n] = (t)\t_T^^nT for base-stations k — 1, 2.
Let M denote the total number of chip-spaced channels
due to both receiver antenna diversity and / or oversampling.
3. CHIP-LEVEL EQUALIZER
The “Chip-level” MMSE equalizer is shown in Figure 1 (two
antenna case with no oversampling). It estimates the multi¬
user synchronous sum signal for either base-station 1 or 2,
and then correlates with the desired user’s channel code
times that base-station’s long code. To derive the chip-level
MMSE equalizer, it is useful to define signal vectors and
channel matrices based on the equalizer length Ng. The “re¬
covered” chip signal will be — D] = g(*)ffy[n] for some
delay D, where is the MNg x 1 chip-level equalizer for
SW[„] = [*<*>[«], s(fc)[n - 1], . . . ,s(k)[n - (Ng + L - 2)]]'
H-k) is the Ng x (L + Ng — 1) convolution matrix
' fcjk)[0] 0 ... 0
fi|fc)[l] fijfc)[0] 0 0
H[fc)= h\k)[L- 1] h\k)[L- 2] fcjk)[0]
0 h\k)[L- 1] •• h\k)[l]
0 0 0 h\k) [L - 1]
Equation (7) is more compactly written as
y[n] = 'Hs[n] + 1][n]
U = H(1) : H(2)
s[n] = [s^T[n] s*2^T[n]] .
The MMSE criterion is
min E{\gWH(Us[n\ + Tj[n]) - t5^(k)[»]|2} (14)
g(*)
where Sd is all zeroes except for unity in the (D + 1) — th
position (so that S'pS^'1 [n] = s^[n — D]).
We assume unit energy signals, E{|s^ [n]|2} = 1, and fur¬
thermore that the chip-level symbols s^k' [n] are independent
and identically distributed, £{s[n]sH[n]} = I. This is the
case if the base-station dependent long codes, c^[n], are
treated as iid sequences, a very good assumption in practice.
The equalizer which attains the minimum is
g(k) = (««a + R„) ‘h(%.
The MMSE is
MMSE
= l-6lH(-k)H (MU* + Rrw'j 'h^Sd. (16)
19
yiM'
y2[n]'
«?**[«]
gf\n]
&
Chip-level
£K)[«-D]
tyX m]
yi[n\>
gV\n]
gf\n]
I I a n = mN + D
1—1 Ci/atjAa/./oi/^/
Symbol-level
Figure 1. Chip and Symbol MMSE Estimators for kth Base-Station, two antennas, no oversampling.
The MMSE equalizer is a function of the delay D. The
MMSE may be computed for each D,0<D<Ng + L — 2
with only one matrix inversion (which has to be done to form
gW anyway). Once the D yielding the smallest MMSE is de¬
termined, the corresponding equalizer g^ may be computed
without further matrix inversion or system solving.
4. SYMBOL-LEVEL EQUALIZER
In this section we present what we call the “symbol-level”
MMSE estimator. This estimator depends on the user index
and symbol index, and hence varies from symbol to symbol.
The FIR estimator that we derive here is a simplified version
of that presented in [9] where in our case, all the channels and
delays from a given base-station are the same. The conclu¬
sions reached in that paper apply equally well here, namely
that FIR MMSE equalization always performs at least as well
as the “coherent combiner” (that is, the RAKE receiver).
This type of symbol-level receiver has also been presented in
[10], although again not specifically for the CDMA downlink.
The symbol-level equalizer differs from the chip-level
equalizer in that the base station and Walsh-Hadamard codes
do not appear explicitly in the block diagram (see Figure 1).
Instead, the codes become incorporated into the equalizer it¬
self. To derive the equalizer, we first define [n] as the bit
sequence b\k\m] upsampled by Nc: [n] = bjk\ m] when
n = mNc and [n] = 0 otherwise. We wish to estimate
[m] directly and we do this by finding
min£{|a<fc)[n - D] - a$k)[n - D]\2} (17)
where the minimization is done only when n — D = mNc.
As in the chip-level case, aSk\n — D] = g^Hy[n] where y[n]
is given by Eq. (11). Setting n = mNc + D, the MSE is
minimized yielding
[n] = Z?{s^[n]s^H[n]). We assume here that
user is only transmitted by base station k. We
also assume that the base station and Walsh-Hadamard
codes are deterministic and known so that the only ran¬
dom elements in s[n] are the transmitted bits. Then
[n]s^*[m]} = 0 for fc / y and any n and m, so
R^>[n] = R^[n] = 0. The (i,j)th element of R {kk)[n]
is S\kk)[n] = E{sW[n + 1 - + 1 - j]}. When i = j,
Sfkk) [n] = 1. When i j,
(fcMr , ( BiAr>Wlk)[n], when
$ij lnJ - \ (n + 1 — i)modVc = (n + 1 — j)modArc
I 0 otherwise
where R^
the desired
(22)
where
B>,[n\ = 4»)[n+l-,']ci?*[n + l-j] € {±2, ±2j] V i,j (23)
nW
<’[»] = E4fc) [(« + 1 — t)modAlc]c},*^[(n -f 1 — j)modAfc]
p=i
(24)
g(fc)H = (‘HR8s[mNc + D]UH + R,,)_1Ri,sH (18)
where
Rss[n] = £{s[n]sH[ri]} (19)
Rbs[m] = £{fc’[m]s[mA'c + D]} (20)
We now proceed to derive expressions for Rss[rc] and
Rbs[m], Using Eq. (13),
Rss[u] =
R R^W
(21)
Figure 2. Bound on the potentially non-zero off-
diagonal elements of f?ss[n] [Aic = 64].
With
note that
fixed m and n,
cifc) tm], • • • i cNc [m]j and
are two different rows of the Hadamard matrix. The element-
by-element (Schur) product of these two rows is also a row
of the Hadamard matrix containing (Nc/2) l’s and ( Nc/2)
-l’s. So
Nu = 1, . . . , Nc/ 2
Nu = Nc/2 + l,...,Nc
(25)
20
Therefore, when i ^ j and (n + 1 — i)modNc = (n + 1 —
j)modJVc,
|5<f)N| < | J
1 Mk) = 1, . . . , Nc/2
Nc/Nik) - 1 Nik) = Nc/2 + 1, . . . , Nc
This bound is plotted as a function of in Fig. 2.
Note that when = Nc, Sjkk^ [n] = 0 for all i ^ j , so
R<kk) [n] = I. If we assume that the Walsh codes are
chosen randomly when N ^ < Nc, it can be shown that
Kk)i n\/Nlk) is a linear function of a hypergeometric ran¬
dom variable. Its variance is (Nc — ) /(Nc — 1).
Therefore, those off-diagonal elements which are not zero
have zero mean and the variance shown in the plot in Fig.
2. For nearly all values of Nik\ the variance is clearly quite
small. So in all cases, we may well approximate [n] by
I in Eq. (18) yielding
g(fc)[m] = (WH* + (27)
We will see through simulation that this approximation
works quite well when compared to the “exact” equalizer
constructed with a time-varying R ss-
The ith element of Ris[m] is (with n = mNc + D):
f c[kJ[n + 1 — *lc^[Z> + 1 — t], for
'’[mjsjn+l-i]} = < 0<l5 + l- i<Nc-l
[ 0 otherwise
(28)
With D satisfying Nc — 1 < D < L+Ng — 2, the entire Walsh
code for the desired user appears in Ri,s[m] and
Rss[m] = [ 0c4-i_jvc cy[m] 0l+n9-2-D ]T (29)
Cj[m] = [c^[(m + l)Ne - l]c(k) [Nc — 1] .
c[kJ[mNc + l]c<*° [1], [mAfc]c{*) [0]]T
While the equahzer varies from symbol to symbol due
to variation in both RssM and R(>s[to], by approximating
RssM by I, the variation is confined to Rbs[m].
5. RAKE RECEIVER
The RAKE receiver is simply a multipath-incorporating
matched filter. In particular, the RAKE can be viewed as a
chip-spaced filter matched to the channel, followed by cor¬
relation with the long code times channel code. Note, in
practice, these operations are normally reversed, but may
be reversed due to short-time LTI assumptions. The RAKE
receiver is exactly represented by the “Chip-Level” portion
of Figure 1, if we let Ng = L and fl^M = h^[L — n], n =
0, . . . , L - 1, i = 1, . . . , M.
Walsh-Hadamard sequence. The signals for all the users are
of equal power and summed synchronously, and each base-
station had the same number of users. The sum signal is
scrambled with a multiplicative QPSK spreading sequence
(“scrambling code”) of length 32768 similar to the IS-95 stan¬
dard.
The uncoded BER results are averaged over different chan¬
nels for varying SNRs. The channels were generated accord¬
ing to the model presented in Section 2. “SNR” is defined
to be the ratio of the sum of the average powers of the re¬
ceived signals from the desired base-station, to the average
noise power, after chip-matched filtering. “SNR per user
per symbol” is the SNR divided by the number of users and
multiplied by the spreading factor. For the chip-level MMSE,
the total delay of the signal, D, through both channel and
equalizer, was chosen to minimize the MSE of the equalizer.
We first present results for a receiver near the base-station
so that out-of-cell interference is negligible. Two receive
antennas are employed with no oversampling. Two equal¬
izer lengths were simulated: for chip-level, Ng = 57 and
114, while for symbol-level, the length is chosen Nc — 1
longer. Since the chip-level equalizer is followed by corre¬
lation with the channel code times long code, its effective
length is Ng + Nc — 1; hence, a fair comparison between
the symbol-level and chip-level sets the symbol-level equal¬
izer longer by Nc — 1 chips. Figure 3 presents the results
for the fully loaded cell case, i.e. 64 equal power users were
simulated. The RAKE receiver is significantly degraded at
high SNR by the MAI, which is seen in the Figure as a BER
floor for SNR greater than 10 dB. The chip- and symbol-level
equalizers perform much better than the RAKE. Increasing
the equalizer length improves performance for both chip-level
and symbol-level. Comparing the length 57 chip-level to 120
symbol-level, we observe little improvement in the symbol
level at low SNR with increasing improvement, up to 2-3
dB, at high SNR. Comparing length 114 chip-level to 177
symbol-level also shows an improvement that increases with
SNR, but less of an improvement than for the shorter equal¬
izers. Note that since all 64 channel codes are present and
have equal power, R,, = I and the symbol-level MMSE es¬
timate is optimal in the MSE sense.
In Figure 4, once again the out-of-cell interference is as¬
sumed negligible. In this simulation only 8 equal power chan¬
nel codes are active, i.e., the cell is only lightly to moderately
loaded. In this simulation the RAKE receiver does much
better since it experiences less in-cell MAI than for 64 users.
For the range of SNR simulated the chip-level equalizer does
only slightly better than the RAKE receiver. As for the
fully loaded cell, the symbol-level equahzer performs better
than the chip-level equalizer. For comparison the “optimal”
symbol-level equahzer is shown which involves a matrix in¬
verse for every symbol (as in Equation (18)); this equahzer is
only slightly better than the symbol-level equahzer presented
in this paper. This result justifies the assumption / simplifi¬
cation that R„ is proportional to I, even when Nu < Nc.
Figure 5 results from a simulation with two base-stations,
each with 64 equal power users. The 2nd base-station is
treated as interference and is received with the same power
as the 1st, desired user’s base-station. Specifically,
6. SIMULATION RESULTS
A wideband CDMA forward hnk was simulated similar to
one of the options in the US cdma2000 proposal [1], The
spreading factor is Nc — 64 chips per bit. Simulations
were performed for both “saturated cells,” that is, all 64
possible channel codes active, as well as lightly loaded cells
with 8 channel codes active. The chip rate is 3.6864 MHz
(Tc = 0.27 ps), 3 times that of IS-95. The data symbols
are BPSK which, for each user, are spread with a length 64
Af M
£ E{\y(m Ml2} = £ E{\y£> Ml2}. (31)
m=l m= 1
In addition to two independent antennas, two-times over-
sampling is employed for a total of four chip-spaced channels.
The results are very analogous to the single base-station case:
the symbol-level out-performs the chip-level, increasingly so
at high SNR. However the improvement is more dramatic,
especially for the shorter lengths.
21
7. CONCLUSIONS
1 base station: 2 antennas
The symbol-level equalizer derived here performs better than
the chip-level, however at a greater computational cost. In
fact our simulations have shown that even though the equal¬
izer is sub-optimal, it has performance closely approaching
optimality. The approximation that the source covariance
is diagonal means that a matrix inverse is required only as
often as the channel changes (and not every symbol), and
hence the computational complexity is much smaller than
the optimal equalizer.
REFERENCES
[1] Telecommunications Industry Association, “Physical
Layer Standard for cdma2000 Standards for Spread
Spectrum Systems - TIA/EIA/IS-2000.2-A”, TIA/EIA
Interim Standard, March 2000.
[2] Anja Klein, “Data Detection Algorithms Specially De¬
signed for the Downlink of CDMA Mobile Radio Sys¬
tems”, in IEEE 47th Vehicular Technology Conference
Proceedings , pp. 203-207, Pheonix, AZ, May 4-7 1997.
[3] Colin D. Frank and Eugene Visotsky, “Adaptive Inter¬
ference Suppression for Direct-Sequence CDMA Sys¬
tems with Long Spreading Codes”, in Proceedings 36th
Allerton Conf. on Communication , Control, and Com¬
puting, pp. 411-420, Monticello, IL, Sept. 23-25 1998.
[4] I. Ghauri and DTM. Slock, “Linear receivers for
the DS-CDMA downlink exploiting orthogonality of
spreading sequences”, in Conf. Rec. 32rd Asilomar
Conf. on Signals, Systems, and Computers, Pacific
Grove, CA,, Nov. 1998.
[5] Kari Hooli, Matti Latva-aho, and Markku Juntti,
“Multiple Access Interference Suppression with. Linear
Chip Equalizers in WCDMA Downlink Receivers”, in
Proc. Global Telecommunications Conf., pp. 467-471,
Rio de Janero, Brazil, Dec. 5-9 1999.
[6] Stefan Werner and Jorma Lilleberg, “Downlink Chan¬
nel Decorrelation in CDMA Systems with Long Codes” ,
in IEEE 49th Vehicular Technology Conference Pro¬
ceedings, vol. 2, pp. 1614-1617, Houston, TX, May 16-
19 1999.
[7] Thomas P. Krauss and Michael D. Zoltowski, “MMSE
Equalization Under Conditions of Soft Hand-Off” , in
IEEE Sixth International Symposium on Spread Spec¬
trum Techniques & Applications (ISSSTA 2000) (to ap¬
pear), September 6-8 2000.
[8] T. Krauss and M. Zoltowski, “Oversampling Diversity
Versus Dual Antenna Diversity for Chip-Level Equal¬
ization on CDMA Downlink”, in Proceedings of First
IEEE Sensor Array and Multichannel Signal Processing
Workshop, Cambridge, MA, March 16-17 2000.
[9] Hui Liu and Mike Zoltowski, “Blind equalization in
antenna array CDMA systems”, IEEE Transactions
on Signal Processing, vol. 45, pp. 161-172, Jan. 1997.
[10] A. Klein, G. Kaleh, and P. Baier, “Zero Forc¬
ing and Minimum Mean- Square-Error Equalization for
Multiuser Detection in Code-Division Multiple- Access
Channels”, IEEE Transactions on Vehicular Technol¬
ogy, vol. 45, pp. 276-287, May 1996.
Figure 3. Fully loaded cell, all 64 channel codes in
use.
Figure 4. Lightly loaded cell, 8 out of 64 active chan¬
nel codes.
Figure 5. One interfering base-station of equal
power, 64 channel codes per cell.
22
TRANSFORM DOMAIN ARRAY PROCESSING FOR CDMA SYSTEMS
Yinnin Zhang* , Kehu Yang* and Moeness G. Amin*
t Department of Electrical and Computer Engineering,
Villanova University, Villanova, PA 19085
E-mail: zhang@ece.vill.edu, moeness@ece.vill.edu
* ATR Adaptive Communications Research Laboratories,
Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
E-mail: yang@acr.atr.co.jp
ABSTRACT
In this paper, we propose transform domain array process¬
ing schemes for DS-CDMA communications. Space-time
adaptive processing (STAP) is a useful means to combat
the multiuser interference (MUI) in CDMA systems. The
computation burden and slow convergence are two major
problems in implementing the STAP. This paper proposes
optimum and sub-optimum transform domain arrays with
different feedback schemes for CDMA communications. The
transform domain arrays provide reduced computations over
traditional implementation methods as well as they offer
improved convergence performance, leading to an efficient
system implementation.
1. INTRODUCTION
Array processing in direct-sequence code division multiple
access (DS-CDMA) communications has recently attracted
considerable attention [1, 2, 3]. The use of the joint space-
time adaptive processing (STAP), which includes two-di¬
mensional RAKE (2-D RAKE) receiver, provides excellent
performance of suppressing the multiuser interference (MUI)
and inter-symbol interference (ISI) as well as combining the
multipath signals to achieve the RAKE diversity effect in
frequency-selective fading. In order to combine sufficient
number of multipath rays to enhance the signal power and
reduce the ISI, a large number of weights are required at
the feedback loop. The complexity and convergence rate
problems remain the bottleneck of the implementation of
these systems [4].
In this paper, we propose a transform domain app¬
roach to chip-level space-time adaptive processing for DS-
CDMA communications with different feedback schemes.
Chip-level space-time adaptive processing effectively miti¬
gates both MUI and ISI before despreading and, as such,
only a simple correlation and summation operation with the
desired user’s code is required to follow. When subband
array is applied to the chip-rate STAP processing, the sig¬
nal decorrelation using orthogonal transforms and feedback
schemes greatly reduce the circuit size within each single
The work of Y. Zhang and M. G. Amin is supported by the
Office of Naval Research under Grant N00014-98-1-0176.
feedback loop, and subsequently improves the receiver con¬
vergence performance [5, 6]. Discrete Fourier Transform
(DFT), filter banks and wavelets are among the commonly
used orthogonal transform for this purpose [7]. In this
paper, we consider the DFT as the example. Decimation
available at the transform domain processing also makes it
possible to reduce the signal processing speed at each trans¬
form domain bin [5, 6].
2. SPACE-TIME ADAPTIVE PROCESSING
FOR CDMA
We consider a base station using an antenna array of N sen¬
sors with P users. In CDMA systems, usually P > N. The
received signal vector at the array is expressed, in discrete¬
time form sampled at the chip rate, as
P oo
*(*) = EE dPmP(k -i)+ b (k) (i)
p= 1 l— — oo
where dp(k) and hp(fc) are the chip-rate sequence and the
channel response vector of the pth user, and b (k) is the
additive noise vector.
In CDMA communications, each symbol is spread into
L chips. Without loss of generality, we denote the signal
of the user of interest as si(n), and the signals from other
users as sp(n), p = 2, ..., P. Aperiodic spreading sequence
are assumed. The chip length is L = T/Tc, where T and
Tc are, respectively, the symbol duration and chip duration.
We denote the spreading sequence for the nth symbol of the
P users as cp(n, l), p = 1, ..., P, l = 1, ..., L. Then,
dp(k) = sp(n)cp{n, l - lp) (2)
where k = nL + l, and lp{ 0 < lp < L) is the chip delay
index that models the asynchronous system. We make the
following assumptions:
Al) The information symbols sv{n),p = 1,2, ...,P, are
wide-sense stationary and i. i. d. with £,[.sp(n)s*(n)] = 1.
A2) The spreading sequences cp(n,l),p = 1, 2, ..., P, l =
1, • • • , L, are assumed independent random sequences.
0-7803-5988-7/00/$10.00 © 2000 IEEE
23
and
A3) All channels h p{k),p — 1,2, are linear time-
invariant, and of a finite duration within [0, DTC], That is,
hp(fc) = 0,p = 1, 2, P, for k > D and k < 0.
A4) The noise vector b (k) is zero-mean, temporally and
spatially white with
U[b(k)bT(fc -(- i)] = 0 for any /
and
£[b(fc)bH(fc-M)] = aINS(l),
where the superscripts T and H denote transpose and con¬
jugate transpose, respectively, I at is the N x N identity
matrix, and 5(1) is the Kronecker deta function.
By stacking M consecutive chips of x(fc), we can obtain
p
x(fc) = ]T Wpdp(fc) + b(fe) = Hd(k) + b(fc), (3)
p=i
where
x(k) = [xT(fc) xT (k — 1) • • • xT(fc — M + 1)] T , (4)
d p(k) = [dp(k) dp(k - 1) • • • dp(k - M + l)f , (5)
d(fc)= [£(k)dUk) ••• <£(fc)]T, (6)
'Ll —
rhp(o) ... h (dp) o . o-i
0 hp(0) h p(Dp) 0 0
L 0 . 0 hp(0) ... hp(Z)p)J
(7)
H = [Hi V.2 ■■■ Up]T , (8)
and
b(fc) = [bT(fe) b T(k - 1) • ■ • bT(k - M + 1)] T . (9)
Denote w as the weight vectors of the STAP system
corresponding to x(fc), the output of the STAP becomes
y(k) = wT x(k). (10)
The optimum weight vector under the minimum mean square
error (MMSE) criterion
min E \y(k) — di(k - v)\2 (11)
W
is given by the Wiener-Hopf solution
w opt = R_1r, (12)
where v > 0 is a delay to minimize the MMSE,
R = E[x*(fc)xT(fc)], (13)
r = E[x*(k)di(k — n)], (14)
and the superscript * denotes complex conjugate. The
training signal is assumed to be an ideal replica of di(k).
Prom the assumptions Al) - A4), (13) and (14) can be
expressed as
R = %*'Ht + <tImat, (15)
r = 'Hie,,, (16)
respectively, where
ev = [0_— _0 1 0 • Of. (17)
V
The MMSE is given by
MMSE = E |wjptx(fc) -di(k- v)\2 = 1 - rHR-1r. (18)
Despreading the array output signal y (k) by the sig¬
nature code of desired signal, we obtain the symbol-rate
output signal for detection, expressed as
L- 1
z(n) = '^y(nL + l + v)ci(n,l). (19)
1=0
3. TRANSFORM DOMAIN ARRAYS WITH
DIFFERENT FEEDBACK SCHEMES
3.1. Centralized Feedback Scheme
Performing a transform of x(n) by using an orthogonal
matrix T, we obtain the received signal vector at the trans¬
form domain as
xt( n) = Tx(n) (20)
with
xr(fc) = [(xyl)(fc))T (xf(fc)f ••• (x<.M)(fc)f]T, (21)
where xf }(n) is the signal vector at the mth transform
domain bin. Denote wr = (wfr (w®)T (wf^fj
as the weight vector in the transform domain. Then the
output of the transform domain array system becomes
yr(k) = wjxr (k) = w?Tx(A;). (22)
Again, using the MMSE criterion
min E \yr(k) — di(k — v)\2 , (23)
the optimum weight vector is given by
w r.opt = R^rr = T*wopt, (24)
where
Rt = F[xH&)xr(fc)]
= T*RTt (25)
= (T UY (Tnf +ct1mn,
rT = E[x*T(k)di(n - »)] = T*r = (TWi)* e„. (26)
It is easy to verify that the transform domain array with
centralized feedback scheme provides the same steady-state
MMSE performance, as given by equation (18). The cen¬
tralized feedback scheme is depicted in Fig. 1.
24
Reference signal
we ignore the off-block-diagonal elements of the correla¬
tion matrix Rr , yielding an approximation by the block-
diagonal matrix
0 Rl,
0 0 RV
Fig. 1 Subband array with centralized feedback.
3.2. Localized Feedback Scheme
We note that the orthogonal transform can reduce the corre¬
lation between different transform bins. DFT, filter banks,
and wavelets are commonly used methods for providing
orthogonal transforms. Here we consider the DFT as the
example. Denote
R|m) =S[(x^m)(n))*(x^(n))r]
V ' (33)
= (T(m)ny (T(m)W)T +<xLv
is the signal covariance matrix of x^(n). Using the prop¬
erty of block-diagonal matrix, we have
(R'r)"1 =
0 (Rt )'
Therefore, the inversion computation of dimension NM x
NM becomes M parallel group of matrix inversion of dimen¬
sion NxN, as such the computations can be greatly reduced.
..
>o>o
W°M
wit
W°M ■
wit ■
:: i '
VV M
(27)
When recursive methods are used, it is realized by using M
parallel control loops with N weights in each loop. The
localized feedback scheme is shown in Fig. 2.
W°M
wT~l) •
.. w^\
Reference signal
as the M x M transform matrix at the output of each array
sensor, where
WM = exp (~|p) , (28)
then the transform matrix T becomes
T = P2(Lv®T„)P1, (29)
where ® denotes the Kronecker product. In (29), Pi is a
permutation matrix to change the order of the vector x(n)
such that the M samples at each array sensor align together,
and P2 is another permutation matrix that allows the N
data of each bin to align together.
T can be expressed in the form
where T(m* is the N x NM submatrix of the matrix T
corresponding to the mth bin. Denote
x(m)(n) =T(m)x(n) (31)
as the signal vector at the mth subband. When the sig¬
nal correlation between different transform bins is small,
Subband array with localozed feedback.
We use d\ ( k ) as the reference signal at each transform
bin. In this case, the cross-correlation vector between the
received signal vector and the reference signal at the mth
transform bin becomes
E (4-)(fc))* *(*-.)] = [T(
25
In the localized feedback scheme, the weight vector at
each bin can be obtained from the NxN correlation matrix
Ry" ; and the N x 1 correlation vector r^’1 ' which are deter¬
mined only by the data vector and reference signal at that
bin, i.e.,
t(m)
W T
(r
(36)
Therefore, the centralized feedback transform domain array
can be approximated by a set of parallel independent rank-
reduced adaptive array processors at each bin, at the cost
of ignoring the correlation between signals at different bins.
Such transform domain array with the localized feedback
scheme can be easily implemented by using a set of parallel
array processors, each with the number of weights equal to
N, instead of NM.
It is clear that
/
rT
(r‘'>)T (r?>)T ... (r<"’)T]T = rT. (37)
Therefore, the equivalent full-band weight vector of the
localized feedback transform domain array becomes
Reference signal
w 'T = (R't) 1 r'T = (RT)-1rT. (38)
Fig. 3 Subband array with partial feedback.
The corresponding MSE of the localized feedback scheme is
given by
MSEz,f = 1 +rtf{R'T)-1RT(RlT)-1rT
— 2Re [rF(R'T)-1rT] ■
Equation (39) implies that the localized feedback transform
domain array approach is suboptimal, and, its performance
depends on the significance of the cross-correlation between
signals at different bins. It is clear from (25) and (39) that
the off-block-diagonal element of matrix Rt, and subse¬
quently the MSE performance of the localized feedback sub¬
band array, depend on both the transform matrix T and the
channels Hp,p = 1, 2, ..., P.
3.3. Partial Feedback Scheme
In the previous subsection, we discussed the transform do¬
main array with localized feedback scheme as an approxi¬
mation of the transform domain array with centralized feed¬
back scheme. Such localized feedback scheme reduces the
number of weights at each bin at the expense of perfor¬
mance reduction, since the off-block-diagonal elements are
not considered in the weight estimation.
A subband array with partial feedback, which is shown
in Fig. 3, is also possible and provides more flexibility in
trading-off the system complexity with the steady-state MSE
performance. As shown below, the partial feedback scheme
is a generalization of the centralized and localized feedback
schemes, which can be considered as two extreme and spe¬
cial cases.
In the transform domain array with partial feedback
scheme, the total M bins are divided into K groups. The
number of bins in ith group is Mi, i = 1, 2, ..., K, with Mi +
M2 H - 1- Mk = M. In this paper, we consider the simple
case of Mi = M2 = • • ■ = Mk = M/K.
In this case, the signal covariance matrix Rt is approxi¬
mated by a new block-diagonal matrix R't with larger block
size M\N , expressed as
Rt =
R!
(Gl)
n(G2)
Jtvji
r:
0
0
(GK)
(40)
where R^?^ is of dimension M\N x M\N. For Mi > 1,
fewer off-block-diagonal elements are ignored in R't com¬
pared to Rt- Therefore, the partial feedback scheme pro¬
vides more accurate weights estimation, and subsequently
better MSE results, as compared with the localized feed¬
back scheme. Similar to the localized feedback case, the
weight vector in the partial feedback scheme is given by
// rr*n \ — 1 ft
wT = (Rt) vt =
■ (R'?l))-14Gl) ■
(R§?a))-14°2)
.(r^Gk))-i4Gk).
(41)
where
=E [(x<.Gi)(fc))*di(fc-v)
(42)
as d\(k — v) is used as the reference signal at each group,
and
x^‘\k) =
(x«<-1)Ml+1)(fc))5
26
Since
=[(r<?")’' (,?>')T ... (r<f«))T]r = ,T,
(44)
the MSE of the partial feedback array is therefore
MSEpf = 1 +r^(R^)-1RT(RT)_1rT
— 2Re[i#(R£)-1rT] . ^
It is noted that, the partial feedback scheme simplifies
to the centralized feedback scheme when M\ = M. In this
case, Rp becomes Rr, and equation (45) becomes equation
(18). On the other hand, the localized feedback scheme is
achieved by setting Mi = 1. In this case, Rp becomes H'T,
and equation (45) becomes equation (39).
4. CONVERGENCE PERFORMANCE
In this section, we consider the convergence performance of
the transform domain arrays with centralized feedback and
localized feedback. The popularly used least mean square
(LMS) algorithm is considered.
One of the key factors affecting the convergence perfor¬
mance in the proposed transform domain arrays is the num¬
ber of controllable weights in the feedback system. In the
transform domain array with centralized feedback scheme,
the number of weights is NM, whereas in the cases of the
transform domain array with localized feedback and partial
feedback schemes, the number of weights in each indepen¬
dent control loop is N and Mi N, respectively (although the
number of total weights of the entire bins remains NM).
It is known that the convergence rate of LMS algorithm
depends on the eigenvalue spread, i.e., the ratio between
the maximum and minimum eigenvalues of the covariance
matrix [8]. Since the covariance matrix defined at a bin,
R^n), m = 1, ..., M, or that defined at several bins, R j?*\i =
is a submatrix of Rt, from the interlacing prop¬
erty [9], the eigenvalue spread of R^ and that of R^?' ' are
smaller than that of Rt- Therefore, the transform domain
arrays with localized and partial feedback provide improved
convergence performance.
On the other hand, when comparing the STAP system
and the transform domain array with centralized feedback
scheme, since an orthonormal transform does not change
the eigenvalues, it is clear that the eigenvalue spread of R
and Rt are the same. Therefore, the STAP system and the
centralized feedback transform domain array offer the same
convergence performance [6]. However, if the signal powers
at different bins are different (due to, e.g., pulse shaping fil¬
tering, frequency-selective channel characteristics), the con¬
vergence performance can be improved by performing power
compensation at the different bins so that the eigenvalue
spread is reduced [10, 11, 12].
5. CONCLUSION
We have analyzed the performance of transform domain
arrays for DS-CDMA systems with different types of feed¬
back schemes, and derived the respective expressions of
the mean square error (MSE). For all proposed schemes,
the transformation is performed in the chip level before
despreading. It has been shown that transform domain
arrays with localized and partial feedback schemes are gen¬
erally suboptimal, and their MSE performance depends on
the transform matrix of the analysis filters as well as the
communication channel characteristics. Since the local¬
ized feedback scheme reduces the number of weights at
the control loop, the convergence rate is usually improved,
which is of practical importance in implementing space-
time adaptive processing in feist fading environments. The
partial feedback scheme generalizes the other two proposed
schemes, namely, the centralized and localized feedback sys¬
tems. This scheme provides the flexibility to balance the
system complexity with the steady-state and convergence
performance.
REFERENCES
[1] A. J. Paulraj and C. B. Papadias, “Space-time process¬
ing for wireless communications,” IEEE Signal Process¬
ing Magazine , vol. 14, no. 6, pp. 49 83, Nov. 1997.
[2] U. Madhow and M. Honig, “MMSE interference sup¬
pression for direct-sequence spread-spectrum CDMA,”
IEEE Trans. Commun., vol. 42, pp. 3178-3188, Dec.
1994.
[3] H. Liu and M. D. Zoltowski, “Blind equalization in
antenna array CDMA systems,” IEEE Trans. Signal
Processing , vol. 45, no. 1, pp. 161-172, Jan. 1997.
[4] U. Madhow, “Blind adaptive interference suppression
for direct-sequence CDMA,” Proc. IEEE, vol. 86, no.
10, pp. 2049-2069, Oct. 1998.
[5] Y. Zhang, K. Yang, and M. G. Amin, “Adaptive sub¬
band arrays for multipath fading mitigation,” in Proc.
IEEE AP-S Int. Symp., Atlanta, GA, pp. 380-383, June
1998.
[6] Y. Kamiya and Y. Karasawa, “Performance comparison
and improvement in adaptive arrays based on the time
and frequency domain signal processing,” IEICE Trans.
Commun., vol. J82-A, no. 6, pp. 867-874, June 1999.
[7] G. Strang and T. Nguyen, Wavelets and Filter Banks,
Wellesley-Cambridge, 1996.
[8] S. Haykin, Adaptive Filter Theory, 3rd Ed. New Jersey:
Prentice Hall, 1996.
[9] G. H. Golub and C. F. Van Loan, Matrix Computations,
3rd Ed. Maryland: John Hopkin Univ. Press, 1996.
[10] J. C. Lee and C. K. Un, “Performance analysis of
frequency-domain block LMS adaptive digital filters,”
IEEE Trans. Circuits and Systems, vol. 36, no. 2, pp.
173-189, Feb. 1989.
[11] M. de Courville and P. Dujamel, “Adaptive filtering in
subbands using weighted criterion,” IEEE Trans. Signal
Processing, vol. 46, no. 9, pp. 2359-2371, Sept. 1998.
[12] K. Yang, Y. Zhang, and Y. Mizuguchi, “Subband real¬
ization of space-time adaptive processing for mobile
communications,” in Proc. 10th Int. Symp. on Personal,
Indoor and Mobile Radio Communications, Osaka, Sept.
1999.
27
Sectorized Space-Time Adaptive processing
for cdma Systems
Kehu Yang Yoshihiko Mizuguchi and Yimin Zhang 2
1 ATR Adaptive Communications Research Laboratories,
Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
Email: yang@acr.atr.co.jp
2 Department of Electrical and Computer Engineering,
Villanova University, Villanova, PA 19085, USA
Eamil: zhang@ece.vill.edu
ABSTRACT
Space-time adaptive processing (STAP) is an effective
technique of suppressing both the multiuser access
interference (MUAI) and the inter-symbol interference (ISI)
in wideband CDMA mobile communication systems.
However, its complexity is one of the key problems in
practical implementations. In this paper we propose
adaptive antenna techniques that realize low-complexity
space-time adaptive processing within a given spatial sector
by spatial-smoothing subarray beamforming sectorization.
The proposed technique has the close performance to that of
the associated optimum element-space STAP system.
I. INTRODUCTION
In direct-sequence code-division multiple-access (DS-
CDMA) systems, adaptive antennas under the scheme of
space-time adaptive processing (STAP) [1, 2] is called as
two-dimensional RAKE (2-D RAKE) receivers [3], and is
known to be an effective method in suppressing both the
multiuser access interference (MUAI) and the inter-symbol
interference (ISI). However, the prohibitive computation
complexity of STAP systems is one of the key problems in
the practical implementations which restricts their
application to practical systems and To reduce their
complexity, optimal and sub-optimal approaches based on
parallel implementation and low-rank transformations have
been proposed so far [4-8].
Beamspace-based partially adaptive processing methods
are the sub-optimal approaches widely used in array signal
processing, where reduced-dimension processing is
performed via employing a few beams to encompass the
significant components in the systems [4, 9]. The sectorized
beamspace adaptive diversity combiner is one of the
applications which is effective in combating multipath
fading in the wireless communications [4], References [5]
and [6] proposed other two approaches that involve the
wideband beamforming and the reduced-dimension
beamforming, respectively.
In this paper we propose novel low-complexity sectorized
adaptive antenna techniques which use the spatial-
smoothing subarray beamformers to achieve effective beam
diversity as well as sufficient degrees of freedom (DOF’s)
for MUAI suppression. In the proposed techniques, the full
field of view is divided into a number of spatial sectors,
wherein the sectorized STAP is performed individually. The
array is partitioned into a set of subarrays, each forms a
beam to cover the same specific sector of interest. In the
sector of interest, the number of MUAI’s is greatly reduced
from the full field-of-view condition. The sectorized STAP
scheme combines the advantages of the reduced-rank
beamspace processing and the spatio-temporal processing
techniques. In comparison with the conventional STAP
systems performed in the full field of view, the complexity
of the sectorized processing is highly reduced whereas the
performance loss to that of the optimum STAP systems can
be kept small.
II. ARRAY SIGNAL MODEL
Consider a cellular CDMA base station using an antenna
array of N (N> 1) elements with P users. The p-th user’s
baseband waveform of the transmitted signal is expressed as
sp(t)= X sp(m)pp(t-mT ), (1)
m=-<> o
where sp ( m ) denotes the w-th information symbol of the
p-th user,
Pp(t)= £ cp(j)y(t -jTc),0<t<T (2)
7=0
represents the signature waveform of the p-th user,
[c p{j))NjL()X is the spreading code assigned to the p-th user,
Nc is the number of chips per symbol, y/(t) is the
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
28
normalized chip waveform limited within [0,rc], and Tc is
the chip interval. The spreading sequence can be periodic or
aperiodic, which depends on the standard to be used. In this
paper, we consider the periodic case, i.e., the non-random
CDMA systems.
The array receiving signal vector \{t) is denoted as
p lp
X(0 = 11 a (0 p )^sp ( t -T,p) + n (/)
p=W=l (3)
= X 'LIp(fn)Sp(f-mT) + n(t)
p=\ m=-°°
where
Lp
gp(O = Xa(0p)£ppp(f-Tp), (4)
;=i
{0P,TP,^P } express respectively the angle-of-arrival
(AOA), the time delay, and the propagation loss
corresponding to the /-th path of the p- th user. Moreover,
a (0) is the array steering vector corresponding to 0;
sp (jn) denotes m- th information symbol of the p-th user,
Lp is the total number of multipath rays of the p-th user,
T = NCTC is the symbol duration, and n(r) is the array
noise vector.
Define
Lp
hp(O = Sa(0/%V(#-T,p) (5)
/=i
as the channel response of the p-th user, we can rewrite (3)
as
x(0 = X X % sp{m)cp(jJbp(t-jTc -mT)+n(,t).(6)
p-\ w=-«> 7=0
We make the following assumptions:
Al) The information symbols s p (/?;), p = 1, - ,P are
i.i.d., and satisfy E{s p(m)s*(n)\ = 8pq8mn , where (•)*
denotes complex conjugation and 8pq denotes the
Kronecker delta function.
A2) The channels |hp(r),p = are linear and
time-invariant with a finite duration within [0, DpTc ] . Here,
we assume DpTc >T for wideband CDMA channels.
A3) The noise vector is zero-mean, temporally and
spatially white with £{n(r)nr(r)} = 0 and
£{n(r)nw (r)) = cr2I , where (-)T and (-)H denote
transpose and conjugate transpose, respectively, c2
expresses the noise power, and I is the identity matrix. The
noise vector is also assumed to be uncorrelated with the user
signals.
Denote A= TJJ as the sampling cycle, where / > 1 is an
integer which expresses the factor of oversampling. Thus,
sampling at t = /A + tiTc , the discrete form of (5) becomes
P 4~ Nr- 1
x(/A + nTl.) = '£j ]T laS p{m)c p{j)x
p=\m=-°° y'=0
h p (;A + tiTc - jTc - mT) + n(/A + nTc )
/=0,...^-I. (7)
By stacking x(iA+nTc), i=0, ...7-1, we have
p
x(n) = '£'ZPp(n-d)hp(d) + n(n) , (8)
p=ld=0
where pp (n) is the chip-rate signal sequence of the p-th
user. In (8), we use the notation a(n) = [aT (nTc ), ••• ,
aT ( nTc +(J - 1)A)] T , where a denotes either x, h or n.
III. SYMBOL-LEVEL PROCESSING
1. Chip-level optimum adaptive processing
For the consecutive samples during the period of M chips
( M>NC ), we form the following vectors
X ( n ) = |xr (n),x (n - 1),- • • , x (n -M + l)f , (9)
Sp(n) = lip(n),pp(n-l),-,pp(n-M -Dp+1)]T ,
(10)
N(n) = ^r(n),nr(/7-l),-”,nr(n-M +1)] . (11)
Define the following Sylvester convolution matrix of user p
by the impulse response of its vector channel,
fc(0),hp0)....,h Tp(Dp)]\ as
H<M) —
‘hp(0) ... h p{Dp) 0 . 0
0 !lP(0) - hp(Dp) 0 ». 0
0 . 0 hp(0) - hp(Dp)
(12)
with the dimension of MNJx(M+Dp), and (8) is extended to
X(n)= Spin) + N (n). (13)
p= ■
The output of the STAP under (13) is described as,
yin)=WTXin). (14)
Under the minimum mean square error (MMSE) criterion
29
(22)
min E^fiPo(n-v)-y(n)\2 , (15)
where pPa ( n ) is the training chip sequence of the user p0,
which is considered as the desired user, and v > 0 is the
delay of the training signal selected to minimize the MMSE.
The optimum weights are given by the Wiener-Hopf
equation as
<,c%=R*«V0(v), (16)
where
Rx =E[x(n)XH(n)] (17)
is the space-time correlation matrix, and
rPo (v) = e[u*o (n - v)X («)] (18)
expresses the cross-correlation vector between the training
signal and the received signal vector. It is seen that the
complexity of the chip-level adaptive filter depends on the
dimension of the signal vector, i.e., the dimension of the
weight vector that is selected based on the length of the
associated channels.
It is noted that in CDMA systems, the performance of the
chip-level processing is confined to the number of the
degrees of freedom (DOF’s) provided by the employed
array and the cyclostationarity of the users’ signals. Such a
problem can be mitigated in the scheme of symbol-level
processing, where the MUAI components become quasi¬
random noises after despreading with the signature code of
the desired user.
2. Symbol-level optimum adaptive processing
Symbol-level processing is so called that symbol-duration
spaced taps are used in the space-time filter. Similar to the
oversampling-based subchannel formulation as made in (7),
and (8), the subchannel-based signal vector after
despreading the array receiving signals with the signature
code of the desired user p0 is denoted as
Xc(mNe )/£ Xs ( mNc + l)cpo (l) , (19)
1=0
where
X, (/3) = [xT (/5), xr ()3 - 1),- • ■ ,xr 03 - Nc + l)f . (20)
By stacking K consecutive-symbol samples, we have the
space-time signal vector as
Xc(m) = [xTc 0 mNc),-,XTc ((m-K + l)Ne)]T (21)
Let M=KNC, from (19)-(21), it is seen that Xc(m) has the
same form as (13). This implies
Xc(m) = ^H^Sp(mNc+l)cPo(l)
p= 1 1=0
+ (mNc + /)cPo (/).
1=0
It is seen that
Nfspo(mNc+l)cPa(l)
has KN. +D„
C Po
components that are the consecutive samples of the single¬
path despreading signal waveform plotted in Fig. 1, where
the peaks are the desired finger outputs. The peak
components of the vector
N£sp(mNc+l)cPo(l\p*p0
1=0
standing for the MUAI’s should be suppressed because they
could lead to false fingers in the situations where the near-
far problem exists. When there is no near-far problem, they
are considered as quasi-random noises. The symbol-level
adaptive processing can be performed based on (21), i.e.,
yc (rn) =WTXc(m) = £w[ ((m - l)Nc ) , (23)
/=0
where IV . Similar to the chip-level
processing, under the symbol-level MMSE criterion
min E\sPo(m-v)-yc(m)\2 ,
(24)
the optimum weight vector is obtained as
^ ^ p, symbol '
(25)
where
R c=E[xc(m)X?(m)] ,
(26)
yp(V)=E[rjm-V)XAm)] ,
(27)
spg ( m ) denotes the training symbol sequence of p0- th user,
and v is selected in the same way as explained in (15).
It is noted that the above filter (25) still has the same
complexity as that given in (16).
IV. Sectorized space-time
ADAPTIVE PROCESSING
1. Lower-rank beamspace transformation
Lower-rank beamspace transformation is known to be an
effective way to reduce the complexity of an array
processing system. Unlike the scheme of the conventional
beamforming, here we consider the smoothing subarray
beamforming illustrated in Fig. 2.
Define b = [bl,b2,-",bN_K]T as the beamformer vector,
which forms a beam to encompass the desired signal at each
of the k-\ subarrays ( k<N ). Then, the output signal vector
30
of the beamforming in Fig. 2 is denoted as
\b(t) = 'BT\(t) (28)
where xb(t) = [xb](t),xb2(t),---,xbK(t)]T , and the beam-
former matrix B is expressed by
' bx 0 — 0
: bi :
b N-K :
0 b, n-k
0
b\
(29)
0
0
Nx(k+1 )
Xbc(m) = [xTbc(mNc),---,XTbc((m-K+l)Nc)]T . (40)
Under MMSE criterion
min E\lpo(m-v)-ybc(mNc)\2 , (41)
the optimum weight vector is obtained as
<,Sector=RfcX6)(V)’ (42)
where
Rbc=E\xtc(mNc)X£(mNe)], (43)
f\v) = E[sl(m-v)Xhc(mNc)] , (44)
2 Sectorized space-time adaptive processing
The sectorized space-time adaptive processing can be
performed in the same way as that described in Section II
and III by replacing x(t) with xb (t) . Define
zp(t) = BThp(t) (30)
and
n6(/) = Brn(f) (31)
xh(iA + nTc) = BTx(iA + nTc), i= (32)
we have
P D?
**(«) = XZA lp{n-d)zp{d)+nb{n), (33)
p=ld=0
where
x„(«) = [x£ («7’c),-,x[ (nTc +(J- l)A)]r , (34)
zp (n) = tp (nTc ),■ ■ ■ ,zTp ( nTc + (J - l)A)f , (35)
n»(«) = tl (nTc),-,nTb ( nTc +(J- l)A)f . (36)
By stacking the Nc consecutive samples, we have
X„in) - k («).*! (» - Dr (n - Nc +1)]T . (37)
The symbol-level vector after depsreading the output
signals vector Xb{n ) can be denoted as
/=o
Similar to (23), the symbol-level sectorized space-time
adaptive processing can be performed as
yfrc (mNc )=WTXbc ( mNc ) = X w[ Xbc((m-l)Nc) , (39)
1=0
where
spo (m) and v are of the same meaning as that in (27),
respectively.
To further reduce the complexity, we can use only the
significant components over a threshold within the vector
Xbc(mNc), as is commonly implemented. We denote it as the
simplified scheme. The results of the simplified scheme are
included and compared at the computer simulations.
V. COMPUTER SIMULATIONS
Computer simulations are performed to confirm the
effectiveness of the proposed techniques. In these
simulations, an eight-element uniform linear array with
half-wavelength spacing is used. The array is partitioned
into subarrays, and beams are formed at each subarray. For
example, the beamformer for a three-subarray partitioning
(six array sensors at each subarray) can be designed as
Jj _ ^-yi 25« e-y0.75O e- J0.2SU 'gjOMu
where u = 27rsin(0°) and 9° dictates the central angle of
the sector where the spatial rays of the desired user signals
are located. In the simulations, 18 CDMA users’ signals are
considered to be present, where user 1 is considered as the
desired user. The code length of all the users is 127. Each
user has 6 multipath rays. It is assumed that the AOA’s of
the paths are Gaussian distributed for each user, and their
propagation loss and time delay obey the Rayleigh and the
exponential distributions, respectively. Detailed parameters
for the desired user are given in Table 1. The signal -to-noise
ratio (SNR) of the direct ray of the user 1 is assumed as -
lOdB, and the SNR’s of the direct rays of the other users are
randomly chosen from -12.7 dB to -6.6 dB. And their
nominal AOA’s are uniformly distributed. The central angle
of the given sector is assumed as 9° = 12.3° .
We selected K=2, i.e., two taps for the symbol-level
space-time adaptive processing. The steady state residual
error powers of the normal sectorized STAP and its
simplified scheme are plotted in Fig. 3, respectively, where
31
the number of subarrays is changed from one to four. In the
simplified scheme, the threshold is taken as 1.8 times the
standard deviation of the components’ amplitudes of the
signal vector Xbc(mNc). The residual error power of the
element-space STAP is -25.36dB, which is considered as
the bound of the sectorized processing and is also plotted in
Fig. 3. It is clear that the results of three-beam and four-
beam sector STAP are close to the bound, whereas the
complexity and the computational burden are greatly
reduced, especially for the simplified scheme with the
acceptable performance loss.
VI. CONCLUSIONS
We have proposed sectorized STAP techniques for
CDMA systems, which provide effective sub-optimal low-
complexity implementation of a STAP system. Simulation
results show close performance to the optimal element-
space STAP system.
REFERENCES
[1] A. J. Paulraj and C. B. Papadias, “Space-time processing
for wireless communications,” IEEE Signal Processing
Magazine, vol. 14, no. 6, pp. 49-83, Nov. 1997.
[2] R. Kohno, “Spatial and temporal communication theory
using adaptive antenna array,” IEEE Personal
Communications, vol. 5, no. 1, pp. 28-35, Feb. 1998.
[3] H. Liu and M. D. Zoltowski, "Blind equalization in
antenna array CDMA systems," IEEE Trans. Signal
Processing, vol. 45, no. 1, pp. 161-172, Jan. 1997.
[4] T.-S. Lee and Z. S. Lee, “A sectorized beamspace adaptive
diversity combiner for multipath environments”, IEEE
Trans. Vehi. Technol., vol. 48, pp. 1503-1510, Sept. 1999.
[5] J. Ramos, M. D. Zoltowski, and H. Liu, “Low-complexity
space-time processing for DS-CDMA communications”,
IEEE Trans. Signal Processing, vol. 48, no. 1, Jan. 2000.
[6] Y.-F. Chen, M. D. Zoltowski, J. Ramos, C. Chatterjee, and
V. P. Roychowdhury, “Reduced-dimension blind space-
time 2-D Rake receivers for DS-CDMA communication
systems,” IEEE Trans. Signal Processing, vol.48, no. 6,
June. 2000.
[7] K. Yang, Y. Zhang, and Y. Mizuguchi, “Spatio-temporal
signal subspace-based subband space-time adaptive
processing,” in Proc. Int. Symp. on Antennas and
Propagation, Fukuoka, Japan, Aug. 2000.
[8] Y. Zhang, K. Yang, and M. G. Amin, “Transform domain
array processing for CDMA systems,” in Proc. IEEE
Workshop on Statistical Signal and Array Processing,
Pocono Manor, PA, Aug. 2000.
[9] B. D. Van Veen and R. A. Roberts, “Partially adaptive
beamformer design via output power minimization,”
IEEE Trans. Acoust., Speech, Signal Processing, vol.
ASSP-35, pp. 1524-1532, 1987.
Table 1 Parameters of the desired user
No.
■J&m
K63S
4
1
12.3
0
0.045 + 0.998i
2
7
0.02
0.93 - 0.206i
3
n.i
0.32
4
14.2
1.33
mirnamM
5
11.6
1.81
0.355 - 0.264i
6
26.2
1.82
-0.264 + 0.034i
Fig. 1 Single-path despreading waveform
*|(0 *2(0 *3(0 xN.2(t) xN.,(t) xN(t)
Fig. 2 Smoothing subarray beamforming
Fig. 3 Residual error power
32
DEMODULATION OF AMPLITUDE MODULATED SIGNALS
IN THE PRESENCE OF MULTIPATH
Zhengyuan Xu and Ping Liu
Dept, of Electrical Engineering
University of California
Riverside, CA 92521
{dxu, pliu}@ee.ucr.edu
ABSTRACT
Signals modulated by M- ary pulse amplitude modu¬
lation (PAM) or M- ary quadrature amplitude mod¬
ulation (QAM) have certain structured constellation.
When the communication channel introduces inter-symbol
interference (ISI) at the receiver end, demodulation of
such signals can be performed by constant modulus al¬
gorithm (CMA) based equalizers to cancel the inter¬
ference. However, characteristics of modulated signals
are only partially considered in the CMA cost func¬
tion. In this paper, more constraints are imposed on
the equalized signal to fully capture the property of
the modulated signal both in its phase and amplitude.
Observing that PAM signals are uniformly spaced on
the x-axis and QAM signals in two-dimensional signal
space, the property of transmitted signals from each
category can be included in an equivalent determinis¬
tic mathematical description, similar to the constant
modulus. This description is absorbed in our modified
cost function, resulting in a simultaneous minimization
of dispersion relevant to signal’s phase and amplitude.
The performance of the equalizers based on these new
algorithms are compared with the CMA equalizer.
1. INTRODUCTION
In different wireless applications, different modulation
schemes are employed to meet specific resource or ser¬
vice requirements. Each modulation exhibits its own
property. Signals by M-ary pulse amplitude modula¬
tion (PAM) or M- ary quadrature amplitude modula¬
tion (QAM) have certain structured constellation. For
PAM signals, they are uniformly spaced in the real axis
(x-axis), while QAM signals are uniformly distributed
in a 2-dimensional signal space. If such signals are
transmitted through a multipath channel, signal de¬
modulation requires an equalizer to mitigate the chan¬
nel distortion. The particular source characteristics of¬
ten facilitate the equalizer design. The constant mod¬
ulus algorithm (CMA) based equalizer is widely used
[7] and shows its unique capability in equalizing sig¬
nals with constant modulus property [5]. It was first
proposed by [3]. Extensive studies on such equalizers
have followed [1], [2], [4]. The algorithm minimizes the
deviation of modulus of equalized signal from a con¬
stant. The satisfactory performance can be achieved
especially when the transmitted signal has constant
modulus property.
It seems that the knowledge about the phase of the
modulated signal is dismissed in CMA. However, this
knowledge plays an equivalent role in many cases in
representing a signal. It can be expected that its incor¬
poration into the cost function will improve the equal¬
ization performance. To equalize a dispersive chan¬
nel (could be complex) with M- PAM transmitted sig¬
nals, the dispersion in the distance of the equalized
signal away from the x-axis should also be minimized
together with its modulus deviation. Similarly, when a
M- QAM signals are transmitted, it is not sufficient to
consider only the amplitude of the equalized signal in
a 2-dimensional signal space, since they are uniformly
distributed along both directions which are perpendic¬
ular to each other and parallel to two axes. Motivated
by CMA algorithm, we will design new equalizers for
these two kinds of modulated signals by taking into
account their equally spaced property in our new cost
function. Similar to CMA algorithm, the stochastic
gradient descent methods are employed to update our
equalizers. The performance of the equalizers based
on these new algorithms are compared with the CMA
equalizer.
2. PROBLEM STATEMENT
In wireless communications, the multipath channel in¬
troduces inter-symbol interference (ISI) in the received
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
33
signal x G Cn [4]
x = Hs + w (1)
where s G Cm is the complex source vector from ei¬
ther M-PAM or M- QAM constellation, if G Cpxm
is a complex channel matrix, w G Cp represents ad¬
ditive white Gaussian noise (AWGN), and x G Cp is
the received signal vector. To detect the signal s(l),
an equalizer / G Cp is designed. Its output y can be
written as
y = fHx = aTs + fHw (2)
where superscripts (-)T, (-)H stand for transpose and
Hermitian respectively, aT = f H H is the compos¬
ite response of the channel and the equalizer. Perfect
equalization can be achieved in the absence of noise if
the equalizer can compensate the channel in such a way
that a has only one non-zero element [6]
a = eJ0 [0, • • • , 0, 1, 0, • • • , 0]T ' (3)
Therefore the output will be a delayed input with some
phase shift. ISI is completely eliminated in the absence
of noise. Different criteria can be used to seek perfect
equalization. In CMA criterion, the dispersion of the
modulus of equalizer output about a constant is mini¬
mized
Jc(f)=E{(\y\2~r0f} (4)
where “E” represents expectation, r0 = The
algorithm is usually implemented by stochastic gradi¬
ent descent method
f(k + 1) = f(k ) - ti(\y(k)\ 2 - r0)y*{k)x(k) (5)
where * represents conjugate. It is clear that the mod¬
ulus characteristic is captured and employed. However,
most modulated signals possesses properties in both
amplitudes and phase. The M- PAM or M- QAM sig¬
nals take discrete values from a set whose elements lie
on the x-axis or a 2-dimensional signal space uniformly.
Motivated by CMA criterion, we will derive a new cost
function to incorporate this information and develop a
corresponding algorithm to obtain the equalizer next.
3. PROPOSED EQUALIZERS
Let us first review the representations and properties
of PAM and QAM signals. The PAM signals are one
dimensional in the sense that they are real and uni¬
formly distributed on the real axis. The QAM signals
are complex and uniformly spaced in directions of real
axis and imaginary axis. Due to this similarity, the
properties of QAM can be easily found once the prop¬
erties of PAM signals are explored. For a general dis¬
cussion, the multipath channel and the equalizer are
assumed to be complex for both cases. We start with
the equalization of PAM signals.
3.1. PAM signals
M- ary PAM signals can be represented by the following
sm = (2m - 1 — M)d, m = 1, • • • , M
where m is a random number. Usually M is an even
integer and can be written as M = 2 L. These PAM
signals can also be expressed by
sm = (2m - 1 )d, m = -L, ■ ■ ■ ,L (6)
if we define a new variable m = m - L. We will adopt
this signal description later. In (6), m can only take
integers from — L to L which can be expressed by sm
as: m = Sr^~ ■ In the current context, this constraint
is equivalent to sin(rmr) = 0. Thus it requires
«"( 2d -7r = cos( ’^7r) = 0 (7)
The transformation from (6) to (7) is essential in con¬
structing our cost function. The other property of sm
is that it has phase equal to a multiple of n because sm
lies on the real axis. Therefore
sin(p = 0 (8)
where rf> is the phase of sm. Taking into account the
complex equalized signal, we can combine (4), (7) and
(8) in one cost function
Ji(f) = E{{\y\ 2 - r0)2 + aicos2(j^j7r) + a2sm2(<£)}
(9)
where aq and a2 are weighting factors, y is the equal¬
ized signal given by (2), cj> is its phase. In (9), y and
<fr are functions of our equalizer f. Therefore J\ (/) is
a highly non-linear function of / and difficult to mini¬
mize. Similar to CMA algorithm, we update the equal¬
izer according to gradient descent method
/(fe + l) = /(fc)-/x1VJ1(/)|/=/(fc) (10)
The derivative of J\ (/) with respect to fH is required
in (10). It can be derived term by term from the RHS of
(9) . The first term is directly from CMA. Its derivative
can be easily found to be
(E{(\y\2 - r0)2})'f = 2£{(|y|2 - r0)y*x} (11)
34
For the second term, its derivative can be computed
once derivatives of \y\ and <j> are obtained. If we ex¬
press | y | by \/yy*, then the derivative of |?/| is easily
computed to be
(M)/ = (12)
For 0, it can be expressed by / as
0 = arctan
y-y*
j(y + y*)
= arctan
fHx-xHf
j(fHx + XHf)
(13)
Therefore the derivative of 0 can be shown to be
xH f _ 1
^j\y?x ~ 2jyx
(14)
According to (9), (11), (12) and (14), the derivative of
VJi(/) is obtained as
VJi(/) = E{(3x} (15)
where
P = 2(l2/|2 ~r0)y* + a2
sin(20)
2 jy
Try* sin(^f-)
Therefore the stochastic gradient algorithm for the equal¬
izer follows
f(k + l) = f(k)-»1px (16)
3.2. QAM signals
There are some similarities between PAM and QAM
signals. In the signal space QAM signals can be de¬
picted by ( sx,sv ) where
To compute VJ2(f), we first evaluate derivatives of yi
and ?/2. If they are expressed explicitly by /,
y + y* fHx + xHf
Vl ~ 2 2
_ y-y* _ fHn - xHf
V2 ~ 2 j 2 j
then it is easy to show that their derivatives have the
form
(*)/=!• w'/=§
Based on these results and (20), VJ2(/) can be derived
to be
VJ2(/) = -E{Vx} (22)
where
nsin(^w) nsin(^n)
11 2 dx + 2jdy
In the case dx = dy = 1, r) is simplified to
7 r
V = - [sin{ym) - jsin(y2n)]
Substituting (22) in (21) and using instantaneous ap¬
proximation, we can update the equalizer according to
f(k + 1) = f{k) + fi2r)X (23)
The equalization method proposed for either PAM
source or QAM source in this section explicitly con¬
siders the phase and modulus properties of the trans¬
mitted signals. As a result, superior performance is
expected compared with the conventional CMA equal¬
izer which only captures the modulus property.
4. SIMULATIONS
Sa, — (2mx 1 )dx, rrix — • ■ • , Lx (-IT)
Sy = {2jTly 1 )dy, 77Jy = Ly, * * ’ , Ly (18)
This representation can be transformed into (see (7))
= 0, cos(^tt) = 0 (19)
Therefore we can build the following cost function
Mf) = £{cos2(|^7r) + cos2(^7r)} (20)
with yi and y2 to be real and imaginary parts of y
respectively. The gradient descent recursion for the
equalizer can be formulated as
/(* + !) = /(*)- H2 VJ2(/)|/=/(ife) (21)
In this section we provide some simulation examples
to demonstrate the applicability of the proposed PAM
and QAM equalization methods. We also compare
them with the CMA algorithm [3] respectively based
on inter-symbol interference(ISI) and the error prob¬
ability. The ISI is used to illustrate the convergence
property of the algorithm and defined as
jgj _ Si Ijfyj ~
\a\max
where aT = fH H, \a\max is the largest absolute value
of all elements in a. Under perfect equalization, a has
only one nonzero component as in (4). Then ISI be¬
comes zero. Therefore, small ISI indicates the prox¬
imity to the desired response. To gain more insight
about the performance of the methods in the commu¬
nications context, we also adopt error probability as
35
the other measure. It is defined as the percentage of
accumulated decoding errors among total number of
transmitted symbols up to the current iteration, and
obtained from multiple independent realizations with
random input signals.
In the experiments, we consider an unknown non¬
minimum phase channel impulse response used in [6]
with the first 4 coefficients [-0.400 0.840 0.336 0.134].
The equalizer has 12 taps and the initial value of all
zeros [0, • ■ • , 0, 1, 0, • • • , 0]T except that the seventh el¬
ement is 1. 5000 iterations are run in each realization.
Totally 50 independent realizations are performed to
obtain the average results.
First, we compare the proposed PAM equalizer with
Godard approach [3] with the PAM source. The input
signals take six equi-probable values: {±0.1, ±0.3, ±0.5}
The step size p is set to be 0.085, weighting factors
a\ = 0.005 and a.^ = 0 (since a real channel is used).
The first 500 iterations are used for initialization for
both methods. The average ISI after 500 iterations is
plotted in Fig. 1. The solid line represents the pro¬
posed PAM method while the dashed line for CMA. It
is observed that the ISI of the proposed PAM method
converges to a level 15 dB lower than that from CMA
while maintaining the same fast convergence. The error
probability is also shown by Fig. 2. In fact, based on
our observation, the proposed method doesn’t take any
error after convergence (800 iterations), while CMA
still accumulates some errors.
Our second experiment considers QAM source with
4 equi-probable values {±1 ± j}. We also compares the
proposed QAM scheme with the CMA algorithm [3],
The first 20 data points are used for initialization for
both methods. The average ISI and error probability
after 20 iterations are plotted in Fig. 3 and Fig. 4
respectively. Solid lines represent the proposed QAM
equalization method while dashed lines for CMA. It is
seen that the ISI based on the proposed QAM scheme
converges faster than that of the standard CMA while
achieving a much lower level after convergence. The
error probability of the proposed method is also much
lower than that of CMA. This fact can be reflected by
the difference in constellation diagrams of the equal¬
ized outputs for all iterations from a randomly-picked
realization, as shown in Fig. 5 and Fig. 6. It is interest¬
ing to note that the equalized outputs of our equalizer
has a much smaller variation than that of the CMA
equalizer.
5. REFERENCES
[1] Z. Ding, C.R. Johnson and R.A. Kennedy, “On the
(non)existence of undesirable equilibria of Gog-
ard equalizers”, IEEE Trans, on Signal Process¬
ing, vol. 40, pp. 2425-2432, Oct. 1992.
[2] G.J. Foschini, “Equalization without altering or
detecting data”, AT&T Tech. J., vol. 64, no. 8,
pp. 1885-1911, Oct. 1985.
[3] D.N. Godard, “Self-recovering equalization and
carrier tracking in two dimensional data commu¬
nication systems”, IEEE Trans, on Comm., vol.
28, no. 11, pp. 1167-75, November 1980.
[4] H. Zeng, L. Tong and C.R. Johnson, “An Analysis
of Constant Modulus Receivers” , IEEE Trans, on
Signal Processing, vol. 47, no. 11, pp. 2990-1999,
November 1999.
[5] C.R. Johnson, et.al , “Blind Equalization Using
Constant Modulus Criterion: A Review” , Proc. of
the IEEE , vol. 86, no. 10, pp. 1927-1950, October
1998.
[6] O. Shalvi and E. Weinstein, “New criteria for
blind deconvolution of nonminimum phase sys¬
tems (channels)”, IEEE Transactions on Informa¬
tion Theory, vol.36, no.2, pp.312-21, March 1990.
[7] J.R. Treichler, I. Fijalkow and C.R. Johnson,
“Fractionally spaced equalizers”, IEEE Signal
Processing Mag., pp. 45-81, May 1996.
36
- - : CMA
- ; Proposed
Figure 2: Error probability of the proposed method Figure 5: Equalized output of the proposed method
and Godard’s method with PAM sources. with QAM sources.
Figure 3: ISI of the prposed method and Godard’s
method with QAM sources.
Figure 6: Equalized output of Godard’s method with
QAM sources.
Figure 1: ISI of the proposed method and Godard’s Figure 4: Error probability of the proposed method
method with PAM sources. and Godard’s method with QAM sources.
Iteration Number
Iteration Number
MULTICHANNEL AND BLOCK BASED PRECODING METHODS FOR FIXED
POINT EQUALIZATION OF NONLINEAR COMMUNICATION CHANNELS
Arthur J. Redfern
G. Tong Zhou
Texas Instruments
12500 TI Boulevard, MS 8653
Dallas, TX 75243
redfern@ti.com
Georgia Institute of Technology
School of ECE
Atlanta, GA 30332-0250
gtz@ece.gatech.edu
ABSTRACT
Substantial power efficiency improvements are possi¬
ble in communication systems if a moderate amount of
nonlinearity is permitted at the transmitter amplifier
and corrected for at the receiver. The Volterra series
is a suitable model for many power amplifiers, and is
readily incorporated into communication channel mod¬
els. Existing fixed point equalization algorithms for
Volterra channels place restrictive conditions on the lo¬
cations of first-order kernel zeros. We show that multi¬
channel and block based precoding linear equalization
techniques can be combined with the fixed point equal¬
izer to allow for exact equalization of Volterra systems
with mixed-phase first-order kernels.
1. INTRODUCTION
The design of a communication system, from the data
format to the tranceivers, is composed of many parts.
Radio frequency power amplifier design is an important
component of cellular, television, radio, and data trans¬
mission systems. In amplifier design the requirements
of power efficiency and linearity can be at odds with
each other, with the result being that power efficiency
is sacrificed in order to meet linearity requirements [2].
Substantial efficiency improvements can be possible
if some mild nonlinearity is allowed in the transmitter
amplifier and corrected for at the receiver. This im¬
proved efficiency translates to lower operating costs,
longer battery life, and smaller size devices. A penalty
of allowing additional nonlinearity into the system is
that the equalizer must now compensate for a nonlin¬
ear channel.
In this paper we consider fixed point equalization
of communication channels modeled by the Volterra
series [3], [4], [8]. Fixed point equalization in this case
This work was supported in part by NASA grant NGT-
352334 and NSF grant MIP-9703312.
refers to the contraction mapping theorem [3] (not inte¬
ger arithmetic). The Volterra series is a useful nonlin¬
ear model for amplifiers [2], and is readily incorporated
into the overall channel model as an extension of linear
convolution.
Drawbacks of traditional fixed point equalization
techniques include the requirement that the linear com¬
ponent of the channel is minimum-phase (for stable ex¬
act inverses) [3] or its zeros are not near the unit circle
(for approximate inverses) [4]. These can be serious
limitations for realistic communication channel mod¬
els, as the error in the inversion of the linear channel
component is iterated on by the fixed point algorithm.
Recently, multichannel [7] and block based precod¬
ing methods [5] have become popular for linear channel
equalization. This is because both methods convert the
ill-posed single channel inversion problem into a well
posed problem with an exact (zero forcing) solution in
the noise-free case. We show that these principles can
be combined with the fixed point equalizer, for zero
forcing equalization of nonlinear channels with mixed-
phase first-order kernels.
2. THE VOLTERRA SERIES
For the discrete Jth-order Volterra system H, the input
x(n) is related to the output y(n) by:
y(n)
= H(x(n), . . . ,x(n — L))
j
= ^Hj{x(n),...,x{n~ Lj))
j= 1
J Lj Lj j
= 53 53"' 53 M*,..,ri)n.(n-r.)l
j= 1 Tl=0 Tj—Tj — 1 0=1
where Hj is the jth-order operator of H, hj(j\ , . . . ,Tj)
is the nonredundant region of the jth-order kernel, and
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
38
L — max{Li , . . . , L j } . Notice that a first-order Volterra
system (J = 1) is linear convolution (an FIR filter).
Throughout this paper the symbol u(n) will be used
to refer to the linear component of the Volterra system
with additive noise v(n):
Li
u(n ) = ^2 hi{T\)x(n - ri) + v(n).
Ti=0
For the channel input x(n), output y(n), noise v(n),
and linear portion of the output with noise u(n), it will
be assumed that these vectors are composed of a basic
block of N symbols, and a subscript will indicate how
many symbols before this basic block to include, e.g.:
Xl = [x{-L),...,x(N - 1)]T.
An optional argument can be included to specify a sub¬
set of the vector:
xl (a : b) = [x(a),...,a;(6)]T.
If the d sample delay operator z~d is placed before the
vector, then each element of the vector is delayed by d
samples:
z~dXL = [x(-L — d), . . . , x(N — 1 — d)]1 .
We define the Volterra series relationship between
an input vector xl and output vector y0 as:
y0 = H(xL).
v(n)
Figure 1: A single channel Volterra system.
Volterra channels is setting up a fixed point equation
for the input in terms of the known system kernels and
system output, then solving for the input using the
method of successive approximations [3]. Two assump¬
tions are implicit:
Assumption (Al): The K + L previous input sym¬
bols x(-K-L), . . . , z(-l) have already been estimated.
Assumption (A2): The K previous output samples
y(—K), ...,«/(— 1) are available.
In the following derivation, even though xo will be
on the on the left hand side of the equation and xk+l
will be on the right hand side, there is still a fixed point
equation in xo since xk+l can be formed directly from
x0.
The derivation of the fixed point equalizer is well
known in the literature [3]. Here the derivation is per¬
formed using the notation of Section 2 which will em¬
phasize the importance of the inversion of the linear
component of the noisy channel output.
The input/output relationship for the single chan¬
nel Volterra system in Fig. 1 with additive noise at the
receiver is:
As a shorthand notation to refer to the output of spe¬
cific order operators, we define:
b
IWxl) = X>,(xLi)-
j=a
It is often necessary to write the first-order operator
corresponding to a finite impulse response (FIR) filter
as a filtering matrix. For the length <5 + 1 vector c =
[c(Q), . . . ,c(0)]T, the N xN + Q filtering matrix Tn{c )
is defined as:
75v(c)
' c(Q) ■ ■ • c(0)
C(Q)
c(0)
3. FIXED POINT EQUALIZATION
y0 = Hat xLl + H2:J(xl) + v0, (1)
where Hn = Tn (hi ) . Rearranging the terms and ap¬
plying the linear operator Gs with memory K to both
sides yields:
Gs(Hk+n xzc+l, + v/f) = Gs(yk) - GsH2..j(xk+l).
(2)
Notice that each of the vectors and matrices from (1) to
(2) has been extended by K samples in the past (avail¬
able from (Al) and (A2)) since the operator Gs has
memory K. To setup the desired fixed point equation,
it is necessary to make the left hand side of (2) z~dxo-
Define the single channel error term as
es = z~dx0 - Gs{ UK+N xK+Ll + vK)
It is common to choose Gs corresponding to a causal
Kth order FIR filter
gs = [&(#),..., Ss(0)]T,
In this section we review the single channel fixed point
equalizer based on the contraction mapping theorem.
The basic idea underlying fixed point equalization of
designed according to the minimum mean-square er¬
ror (MMSE) criterion. For the MMSE equalizer it is
necessary to make the following assumption:
39
Assumption (A3): The input x(n) and the noise
v(n) are mutually uncorrelated, stationary random pro¬
cesses with known covariance matrices:
Rxz = E[xk+l j (n — K — Li : n)
*k+lM- K ~ L i :n)l
= E[vk(ti- K :n) vx(n- K :n)],
respectively.
If (Al) - (A3) are satisfied, then the equalizer gs
can be solved for as:
gs = R-uJ rxu, (3)
where Ruu and rxu are defined as
H"+1+R
vv
rxu - H*K+1E[x(n - d)x*K+Li (n - K - Lx : n)\.
Substitution of the operator Gs associated with the
filter gs into (2) results in the fixed point equation:
z~dx0 = Gs( yK) ~ GsH2:J{xK+L) + e9.
Assuming that es is small, it is ignored and the approx¬
imate fixed point equation is solved:
z~dx0 = Gs(yK) - GsH2..j(xk+l)-
For the case of d = 0, xk+l can be determined from
z~dx o and (Al). However, when d > 0, it is not possi¬
ble to determine the last d elements of x«+ l , namely
x(N - d), . . . , x(N — 1). To obtain proper estimates of
these last d symbols in z~dx0, they could be the first
symbols estimated in the next block of data.
A drawback of the fixed point equalizer is the er¬
ror introduced into the fixed point equation associated
with the inverse of the first-order kernel. The error de¬
pends on the length K, delay d, and the location of the
zeros of H\ . The fixed point equalizers in the following
two sections eliminate this source of error, and allow
for zero forcing equalization of the linear component
(along with the nonlinear component) of the channel
in the noise-free case.
4. MULTICHANNEL FIXED POINT
EQUALIZER
The availability of multiple observations per symbol pe¬
riod at the receiver has become more common in many
communication systems. Using a superscript ^ to de¬
note the channel, the following assumption is made:
Assumption (A4): There are no common zeros across
all of the linear components {R^}f=1 of the channels.
i/^(n)
Figure 2: A single-input /multiple-output Volterra sys¬
tem.
It is well known that for multiple linear channels,
FIR zero forcing equalization is possible if (A4) is sat¬
isfied [7]. In this section it is shown that these linear
multichannel equalization techniques can be combined
with the fixed point equalizer, to allow for zero forc¬
ing equalization of Volterra channels with mixed-phase
first-order kernels using as little as two channels.
Consider the multichannel Volterra system shown
in Fig. 2 and again assume that (Al) and (A2) are
satisfied. For the sth channel write:
y{oS) = H «Xi, + #2 -j(xL) + v(oS)>
where = 7)i(h|8)). Rearranging terms and apply¬
ing the linear operator Gm -with memory K to both
sides yields:
= G^(y^)-G^H^(xK+L). (4)
Because (4) holds for each channel s, it is possible to
sum the results for all S channels and write:
^GW(hWnx,+1i+vW) =
8—1
- EGm^f](XiC+L). (5)
8=1 8=1
If it is possible to make the left hand side of (5) xo,
then the result will be the desired fixed point equation.
Define the error term as
s
£m = X0 -EGm(H {k)+N^K+L1 + V^).
8=1
If (A4) is satisfied, then in the noise-free case a Ath-
order FIR zero forcing solution exists such that
s
£Gm( ^U+b)=X0,
8=1
40
v(n)
Figure 3: A single channel Volterra system with pre¬
coder.
provided that S(K + 1) > K + L + l [7]. Define the mul¬
tichannel filtering matrix Hmijc+i and the multichan¬
nel A' th- order equalizer gm corresponding to {Gm }j=1
as:
6m
(S),tiT
K+l
(S),T]T
The zero forcing equalizer can be recovered as [7]:
gm = (H m./C+r)1 *K+Li+ 1, (6)
where e/c+z-i+i is a (K + L\ + 1) x 1 vector with a one
in the (K + Li + l)th position and zeros elsewhere.
Substitution of the operator Gm associated with
the filter gm* designed according to (6) into (5) results
in the fixed point equation:
s s
*0 = EG*(y(x) -Y,G^H^j(xk+l) + sm
S= 1 «=1
5. BLOCK BASED PRECODING FIXED
POINT EQUALIZATION
As an alternative to using multiple channels at the
receiver to improve the single channel inversion prob¬
lem, structured redundancy could be introduced at the
transmitter. By block precoding at the transmitter
and block equalization at the receiver, FIR zero forcing
equalization of single channel systems is possible irre¬
spective of the location of channel zeros [5]. As in the
multichannel case, these properties can be extended to
fixed point equalization of Volterra channels.
Consider the block-based transmission scheme of
Fig. 3. At the transmitter, data symbols w(n) are col¬
lected into a block of length M :
w = [w(0 w(M - 1)]T,
and mapped by the precoder Fp to the length N block
of channel inputs xo- If the precoder is linear, then
it can be represented by the N x M matrix Fp. The
precoder structure is chosen to satisfy the following two
assumptions [5]:
Assumption (A5): The lengths L, M, and N satisfy
N = L + M.
Assumption (A6): rank(Fp) = M, and the last L
rows of Fp are zero.
As a result of (A6), Fp can be decomposed as
FP =
FP
0 LxM
where the M x M matrix Fp is nonsingular. Using
(A6) it is possible to write:
XL =
Olxi
Fpw
OlxI
The N row filtering matrix for the first-order kernel
H„ = Tn( hi) can be decomposed as
HN = [nN Hjv Hjv]>
where Hjv is N x L, Hjv is N x M, and Hjv is N x L.
Using these definitions, the input/output relationship
for the block-based system with precoding can be writ¬
ten as
y0 = H/vFpw +
Olxi
Fpw
Olxi
) + v0.
Rearranging terms and applying the linear operator Gp
to both sides yields:
Gp(HjvFpw + v0) = Gp(y0) - GpH2:j(
Olxi
Fpw
Olxi
)•
(7)
If the left hand side of (7) was w, then the desired
fixed point equation would result. Define the error term
as
eP = w — Gp(HjvFpw + v0). (8)
If (A5) and (A6) are satisfied, then in the noise-free
case, a zero forcing solution Gp (with matrix form Gp)
to (8) exists such that [5]:
GpHjvFpw = w.
The zero forcing equalizer can be recovered as [5]:
Gp = F- 1Hjv. (9)
Substitution of the operator Gp associated with the
matrix Gp designed according to (9) into (7) results in
the fixed point equation:
w = Gp(y0) -GpH2:j{
Olxi
Fpw
Olxi
) + £p-
41
6. SIMULATIONS
(a) Single Channel
We considered a third-order baseband Volterra system
with L\ — 5 and 1/3 = 2, whose complex kernel coeffi¬
cient’s real and imaginary parts were chosen randomly
from [-0.5, 0.5], with the third-order kernel scaled by
0.03 such that the nonlinear to linear power ratio is -
23 dB. A 16-QAM input was used, and additive white
Gaussian noise was present at the channel output. For
each data point we generated 100 blocks of N — 100
symbols for 100 different channels.
For the multichannel fixed point simulations we used
5 = 4 channels and the linear component of the equal¬
izer designed according to (6) with order K = 8. The
single channel fixed point simulations (with and with¬
out precoding) used the first of the multichannel fixed
point simulations’ channels. The standard single chan¬
nel fixed point equalizer’s linear component was de¬
signed according to (3) with K = 32 and d = 16.
The linear component of the single channel fixed point
equalizer with precoding was designed according to (9),
with a data block length of M = N — L = 95 and pre¬
coder F = I mxm- For each of the fixed point equaliz¬
ers, 5 iterations of their respective fixed point equation
were performed.
For our performance metric, we calculate the signal
to interference ratio (SIR), defined in terms of the MSE
of the equalizer output:
SIR = — 10 log10 MSE (dB),
vs. SNR. The SIR allows us to assess the ability of the
equalizers to cope with both the noise and the nonlin¬
earity. Fig. 4 compares the output of each of the fixed
point equalizers, along with the corresponding outputs
of the linear components of the equalizers.
7. CONCLUSIONS
In this paper we showed that multichannel and block
based precoding linear channel equalization techniques
can be combined with the fixed point method for zero
forcing equalization of Volterra channels with mixed-
phase first-order kernels. Since the fixed point equalizer
takes the form of a nonlinear correction added to a
linear inverse, it is a practical addition to existing linear
channel equalization schemes.
REFERENCES
[1] G. Giannakis and E. Serpedin, “Linear multichannel
blind equalizers of nonlinear FIR Volterra channels,”
IEEE Transactions on Signal Processing, vol. 45, no.
1, pp. 67-81, 1997.
40 1
30
55 w K
10 k -
qI - 1 - 1 - 1 - 1 -
15 20 25 30 35 40
(b) Multichannel
Figure 4: Comparing the linear and fixed point equal¬
izer outputs.
[2] S. Maas, “Analysis and optimization of nonlinear
microwave circuits by Volterra-series analysis,” Mi¬
crowave Journal, vol. 33, no. 4, pp. 245-251, Apr. 1990.
[3] R. Nowak and B. Van Veen, “Volterra filter equaliza¬
tion: A fixed point approach,” IEEE Transactions on
Signal Processing, vol. 45, no. 2, pp. 377-388, 1997.
[4] A. Redfern and G. Zhou, “A fixed point equalizer
for nonlinear communication channels,” Proceedings of
the Thirty- Third CISS, Baltimore, MD, Mar. 1999, to
appear.
[5] A. Scaglione, G. Giannakis and S. Barbarossa “Redun¬
dant filterbank precoders and equalizers part I: Uni¬
fication and optimal designs,” IEEE Transactions on
Signal Processing, vol, 47, no. 7, pp. 1988-2006, 1999.
[6] M. Schetzen, The Volterra and Wiener Theories of
Nonlinear Systems. New York: John Wiley and Sons,
1980.
[7] D. Slock, “Blind fractionally-spaced equalization,
perfect-reconstruction filter banks and multichannel
linear prediction,” Proceedings of the IEEE ICASSP,
pp. 585-588, Adelaide, Australia, Apr. 1994.
[8] C. Tseng and E. Powers, “Nonlinear channel equal¬
ization in digital satellite systems,” Proceedings of the
IEEE Globecom, pp. 1639-1643, Houston, TX, Nov.
1993.
42
JOINT ESTIMATION OF PROPAGATION PARAMETERS IN MULTICARRIER
SYSTEMS
Said Aouada and Adel Belouchrani
Electrical Engineering Department,
Ecole Nationale Poly technique
P.0. Box 182 El Harrach 16200, Algiers, Algeria
E-mail: belouchrani@hotmail.com
ABSTRACT
A joint propagation parameter estimation method for Mul-
tiCarrier systems is proposed. The main difference between
Single Carrier and MultiCarrier models is outlined and han¬
dled in the derivation of the algorithm. The method uses a
subspace-based 2-D ESPRIT-like approach, exploiting fre¬
quency shift invariance of the system as well as the ULA
geometry to provide closed-form estimation. Basic perfor¬
mances of the algorithm are illustrated through simulations
and compared with respect to the Cramer-Rao bound.
1. INTRODUCTION
In several wireless systems, the transmitted signals are sub¬
ject to the effects of multipath channels, caused by the
remote terrestrial objects and the inhomogeneities in the
physical medium. Estimation of the multipath propagation
parameters from measurements at a multisensor antenna,
provides a better channel characterization for subsequent
processing. These parameters include, among others, the
Direction Of Arrival (DOA) and Time Difference Of Arrival
(TDOA) of each path. In MultiCarrier Modulation (MCM)
systems such as Digital Terrestrial Television Broadcast¬
ing (DTTB) and Digital Audio Boadcasting (DAB), the
transmitted signals are subject to the effects of a multipath
channel, in the same way as are Single Carrier Modulation
(SCM) systems.
Herein, we investigate the possibility of performing closed-
form Joint Angle and Delay Estimation (JADE) for a MCM
system in a single batch, in a way similar to JADE for SCM
systems, by exploiting the frequency diversity of the sys¬
tem, together with a known array geometry. The system
consists of a single source and a single antenna array. A
channel model is derived to outline the frequency shift in¬
variance associated with the system. The model exploits
the stationarity of the parameters over the coherence time
of the channel. It also takes into account the fact that the
unknown complex fadings differ from one carrier to another.
Both the uniform carrier spacing and a known array geome¬
try allow closed-form estimation of the propagation param¬
eters. More particularly, if the antenna is Uniform Linear
(ULA), or has an ESPRIT doublet structure, JADE can be
achieved using a 2D ESPRIT-like technique. The Cramer-
Rao Boimd on the variance of the estimated parameters is
also derived from the obtained model.
2. DATA MODEL
The principle of a Multicarrier transmitter is depicted in
Figure 1. The concept is to transform serial data into par¬
allel lower rate inputs that are modulated by orthogonal
carriers. History and applications of MCM are reported in
[1].[2] and the references therein and are not stated here
for conciseness purposes. Assuming a single MCM source
Figure 1: Block diagram of the MCM transmitter.
emitting over C carriers, the lowpass equivalent transmitted
signal is given by
C oo
*(*) = £ E Sc [k]g(t-kT)e2^** (1)
c= 1 k=— oo
where
• sc [fc] is the k-th symbol conveyed by carrier c,
. {sc[k]},c=l,...,C are independent from one carrier
to another and identically distributed,
• g(t) is the pulse-shape function,
• T is the symbol duration, and
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
43
with
• the frequency spacing between two successive carriers
In the following, the channel is fading and time varying.
However, it is regarded to be stationary within its coherence
time. Assuming C carriers and perfect carrier phase and
sampling time recovery, the complex envelope of the lowpass
received signal at an M -element antenna array at time t can
be written as
c
y (0 = +
C— 1
C oo
= Yj Sc [k]hc(t-kT)e]2n*t +z{t) (2)
c= 1 kss—oo
where hc(t) = [ hCli(t) hCt2(t) ... hc<M{t) ]T is the
transmission channel associated with the c-th carrier, sc [/:]
is the /c-th symbol of duration T conveyed by carrier c and
z(f) is the additive white Gaussian noise. The coherence
time of the channel is assumed to range over K symbol
periods. The channel hc(t) can be modeled as [3]
Q
h c(t) = a (9q)pc(q)g(t - Tq)e~32*cT (3)
q=l
where Q is the number of paths, 9q and Tq are the g-th an¬
gle of arrived and time delay respectively and /3c(g) is the
complex attenuation, which is varying from carrier to car¬
rier. a(0,) is the (Mxl) vector of the array response to
the g-th path, with g = 1, ...,Q and g(t) is the finite sup¬
port modulation pulse-shape function. We assume that the
array outputs are received in parallel over each carrier af¬
ter demodulation. The channel length is LT. We collect
K data samples on each carrier. Using some trivial manip¬
ulations, this can be expressed in a (M x A')-dimensional
matrix form as
YC = HCSC c = 1, ...,C (4)
If the Toeplitz matrix of data symbols Sc, c = 1, ...,C,
is known from training and K > M, an estimate of the
channel samples matrix Hc in (3) can be obtained for c =
1, ..., C, using least squares. Blind estimation of the channel
samples [4, 5] is also possible in case Sc is not known in
advance. The estimated channel can be given as
He = He + Nc (5)
where Nc is the estimation noise matrix.
Omitting the estimation noise, one can easily show that
for each carrier, the terms e~j27TC'$' , q = 1, • • • ,Q in equa¬
tion (5) can be factored out, resulting in
Hc := Acdiag [ec(r)] G c = l,...,C (6)
where the ( i , j)-th element of G is defined as
G i,j = 9 (O' “ l)?1 ~ r<) , i = 1, • • • , Q and j = 1, • • • , L
and
Ac(0) = [ft^a^,) 0a(c)a(tf2) ... /3q(c) a(0Q)] (7)
6=16, 02 ... 6q }T (8)
and
r iT
T = [ Tl T2 ... TQ \ (9)
If we stack all the matrices Hc corresponding to all the
C carriers, we will obtain a large (A/C x L)-dimensional
matrix % whose structure is given by
' H! ■
H2
n =
. He .
:= U(0, t)G (10)
where
Ai(0)diag[ei(r)]
A2(0)diag[e2(T)j
U(0,r)= . (11)
. Ac(0)diag[ec(r)] _
Finally, we include the channel estimation noise matrix
Af, which is appropriately defined in accordance with (5)
and (10). Therefore, the model in (10) becomes
H = U(0, r)G + Af (12)
If we consider that the delay spread of the channel is
Tm (expressed in terms of the symbol period T), then the
coherence bandwidth of the channel is roughly the inverse
of Tm, i.e.,
The frequency separation between carriers in the MultiCar-
rier system is given by A / = j;. All the carriers that lie
within a frequency interval equal to the channel coherence
bandwidth can be seen as identically attenuated. There¬
fore, it is reasonable to assume that the number of carriers
being attenuated equally is n = = [^J, where [-J
denotes the integer part. Under this condition, the number
of ij- carrier sets that share the same attenuation coefficients
is obviously m = nC.
If we consider only the first mp carriers in the deriva¬
tion of the MC-JADE model (10) (mp is at most equal to
C), we will obtain a reduced MC-JADE model ( Mmg x L)
satisfying the following factorization
— Um^(0, t)G -f- Afvnn
- JFi(t) o Ai (0) ‘
/2(t)oA2(0)
:= . G + A Up (13)
-7, m (t) 0 Am(0) _
44
where
and
Aj (9) = [ /?i(i)a(0i) /?2(*')a(^ 2) ••• /?Q(t')a(0g) ]
01 02 • • • 0q 1
01+1 0J+1 ... 0‘g+1
= . . . .
^-i ^-i .
and
0, = (14)
o denotes Khatri-Rao product, i.e., columnwise Kronecker
Product.
3. THE ALGORITHM: MC-JADE-ESPRIT
If the array is Uniform Linear (ULA) or has an ESPRIT
doublet structure, then the angles and delays can be esti¬
mated jointly in closed-form using an ESPRIT-like method.
For the ULA geometry, the steering vector a(#?) will be
given by
a(*,) = [l 0, ... ]T (15)
where
0, = 6^ .me, (16)
and A is the array sensor spacing in wavelenghts.
With the parameter definitions (16) and (14), it is more
appropriate to rewrite (13) as
nmn = U(iPA)G + Armii (17)
where
0 = [ 01 02 ... 0? ]T (18)
and
0 = [ 01 02 ••• 0, ]T (19)
Estimation of the channel subspace and its dimension
is equivalent to finding a basis E of the column span of the
data matrix limit and estimating of the parameters 0 and 0
reduces to jointly diagonalize the matrices and
where
{ /; = jve <20>
and
{ <2i>
where
f J ip = I m/i ® Im— 1 0(M— 1,1) ( (22)
( J tp = I m/i ® 0(M_1,1) IjVf— 1
{J^ = Im® Im(m-I) 0(Af (^1 — 1)), Af)
are the appropriate selection matrices (see [6], [7] and [8] for
details of JADE), ® denotes Kronecker product, I, is an i-
dimensional identity matrix and 0,-,^ is a (i x j)-dimensional
matrix of zero elements.
Details of the joint diagonalization are provided in [7]
and the references therein. The correct pairing between the
0’s and the 0’s is guaranteed by the fact that matrices share
common eigenvectors.
If the pulse-shape function is assumed to be known, the
complex attenuation coefficients can be linearly estimated
using least-squares, by processing the channel samples over
each carrier separately.
4. IDENTIFIABILITY
The parameter identifiability requires to have the (M mpi x
jL)-dimensional data matrices Hi of rank Q, with Q < Mm pi
and Q < L. This means that U/(&, t) must have strictly
more rows than columns and be of full column rank, and G
must have more columns than rows and be of full row rank.
The full rank condition on G together with the channel fac¬
torization (6) imply that all the delays must be distinct. If
two paths have the same TDOA’s, the rank of % becomes
Q — 1 and the corresponding angles cannot be identified
correctly. In this case, "spatial smoothing" [7] can pro¬
vide the solution [6], [7] by performing data extension of the
channel over each carrier in such a way to keep rank Hi
equal to the number of paths Q. In order to allow selec¬
tion of the received data (13), there must be at least a pair
of sensors, i.e., M > 2, and the coherence bandwidth to
carrier frequency-spacing ratio must be at least 2 : 1, i.e.,
> 2 or /( > 2. The last requirement can be satisfied
by appropriately increasing the number of carriers.
5. SIMULATIONS
The following simulation results illustrate performance of
MC-JADE-ESPRIT. In all the experiments, the estimation
Mean Square Error (MSE) is averaged over 500 Monte Carlo
runs of the algorithm and compared against the Cramer-
Rao Bound (CRB) which is derived for the model (13) in the
Appendix. In the figures corresponding to the experiments,
the MSE is plotted using a full line whereas the CRB is
shown by a dotted line.
5.1. Basic performance of MC-JADE-ESPRIT
We consider an antenna of M = 2 elements, spaced at half
wavelength. The number of paths is Q = 3 with parameters
6 = [ -15° 0° 25° ]T, r = [ 0 0.078 0.234 ]T T
and the path fadings being generated from a complex zero-
mean Gaussian distribution with variance [0.4 0.3 0.3].
The channel lenght is half the symbol period T, which is
normalized to T = 1. The pulse-shape function is a raised
cosine with 0.25 roll-off factor. C — 64, with pi = 8. The
employed joint diagonalization method is method ”Q” as
45
MSE of angles (db) MSE of angles (db)
it is referred to in [7]. Fig. 2 shows the effect of the noise
power on the MSE of the estimated DOA’s and TDOA’s.
At high noise powers, the estimation is strongly sensitive to
the channel estimation noise and is erronous. As the noise
effect decreases, the difference with the CRB is about 2 to
3 dB.
5.2. Comparison with SI-JADE
For the same setting, we plot the CRB relative to the pa¬
rameter estimation over the first carrier, using SI-JADE [7],
against the noise power. The stacking parameter as defined
in [7] is taken to be ml = 5. The CRB of SI-JADE is plot¬
ted in Fig. 2 using a dashed-line. Here, for low estimation
noise powers, the parameter MSE of MC-JADE-ESPRIT
is smaller than the CRB of SI-JADE. The greater estima¬
tion precision for MC-JADE-ESPRIT is mainly due to the
larger amount of information involved.
Angle estimation Delay estimation
1 /noise (db) 1 /noise (db)
Angle estimation Delay estimation '
Delay spacing (T) Delay spacing (T)
Figure 4: Temporal resolution of MC-JADE-ESPRIT.
on Fig. 4. It is clear that for small delay spacing, ambigu¬
ity occurs and the full rank condition on the pulse-shape
function matrix is no more satisfied, yielding an erroneous
estimation. Here, no spatial smoothing is applied. For well
separated delays, estimation is seen to depend only on the
noise power.
6. CONCLUSION
Advantage of the algorithm is that it takes into account
the available frequency diversity provided by the multiple
carriers and processes data in a single batch. However, es¬
timation of the channel impulse response is prerequisite to
the application of the algorithm, which makes its perfor¬
mance suboptimal and sensitive to the estmation noise.
Figure 2: Basic performance of MC-JADE-ESPRIT.
Angle estimation Delay estimation
0.05 0.1 0.15 0.05 0.1 0.15
Angle spacing (radian) Angle spacing (radian)
Figure 3: Spatial resolution of MC-JADE-ESPRIT.
5.3. Resolution of the Algorithm
We set the number of paths to Q = 2, with the estimation
noise power being fixed at -20 dB. All the other parameters
are kept the same. In Fig. 3, as it is expected, it is shown
that estimation accuracy improves with well separated an¬
gles, else estimation is dependent on noise power. The effect
of delay spacing on the angle and delay estimation is shown
Appendix
The Cramer Rao Bound
The CRB for the joint problem (13) can be derived as fol¬
lows:
Let us define the parameter vector as
Wu gT(l) ... g T(L) r?)
where
n := [5>?{/3r(l)} 3{/3T(l)}...3*{/3T(m)} 3{/3T(m)} 6T rT]T
and 5R {.} and T {.} denote the real and imaginary parts
respectively. In our case, vectors g (*),* = 1 ,...,£, which
are the columns of matrix G in (13), are deterministic but
unknown. The data are the channel estimates These
data are corrupted by the estimation noise
A fmn-= [ n(l) n(2) ... n(L) ]
where n(i),i = 1 are complex, stationary, zero-mean
Gaussian random processes that are temporally uncorre¬
lated. It follows that the data 'Hmn are also uncorrelated
Gaussian random processes. The likelihood function of the
data is
= \ Mmjil, x
(2ir)Mm'*t (-f)
46
x expj-;^X^ "*(*>(*')} (24)
and the corresponding loglikelihood function is
Finally, the CRB matrix for the parameters of interest,
CRB(0,t), is the 2Q-dimensional bottom-right-comer par¬
tition matrix of CRB{rj) and the bounds are found by tak¬
ing the diagonal elements.
A = In C = const — MmpL In a
n*(t)n(j) (25)
where * denotes complex conjugate transpose. The deriva¬
tives of the loglikelihood function A with respect to the un¬
known parameters can be obtained using results of [9], [6], [10],
as
dA
d{<)
dA
5(g(0)
dA
dr)
MmpL 1 v~' .... ...
-2— + -r2^n Wn(0
= -i-R[U*n(0]
aH
= ^X>{<?(0D*n(«)}
[D^ De DT] (Mmp x 2 (m + 1 )Q)
[D»{/9(1)> D0{/3(1)> ••• D3?{/3(m)} Da{/9(m)}]
au eu 1
a<»{/3(.) i } ••• J
au au 1
SS{/3(i)ll ao{/3(.),} J
r au atr i
L a«i ae, J
r au au i
L St, Or, J
l2(m+l) 0g(»)
with U = U(0, r), and
D
Da
DRW)}
D0{/3(.)}
De
Dr
0(0
Using results of [9], [6], we get
M mpL
5R[U*U] SU
aH
4-»[u*d0(O]
L
J-^»[5*(.)D'De(i)]
7. REFERENCES
[1] J. A. C. Bingham, "Multicarrier Modulation for Data
Transmission: An Idea Whose Time Has Come", IEEE
Communications Magazine, vol. 28, NO. 5, May 1990.
[2] I. Kalet, "The Multitone Channel", IEEE Transac¬
tions on Communications, vol. 37, NO. 2, February
1989.
[3] L. Vandendorpe and O. van de Wiel, "MIMO DFE
Equalization for Multitone DS/SS Systems over Mul¬
tipat Channels", IEEE Transactions on Communica¬
tions, vol. 14, NO. 3, April 1996.
[4] A. Belouchrani and M. G. Amin, "Blind Source Sep¬
aration Based on Time-Frequency Signal Representa¬
tions", IEEE Transactions on Signal Processing, vol.
46, NO. 11, november 1998.
[5] K. Abed-Meraim and Y. Hua, "Blind Identification
of Multi-Input Multi-Output System Using Minimum
Noise Subspace", IEEE Transactions on Signal Pro¬
cessing, vol. 45, NO. 1, January 1997.
[6] M. C. Vanderveen, A. J. van der Veen and A. Paulraj,
"Estimation of Multipath Parameters in Wireless
Communications", IEEE Transactions on Signal Pro¬
cessing, vol. 46, NO. 3, March 1998.
[7] A. J. van der Veen, M. C. Vanderveen, and A.
Paulraj, "Joint Angle and Delay Estimation Using
Shift-Invariance Techniques" IEEE Transactions on
Signal Processing, vol. 46, NO. 2, February 1998.
[8] A. J. van der Veen, M. C. Vanderveen and A. Paulraj,
"Joint Angle and Delay Estimation Using Shift Invari¬
ance Properties" IEEE Signal Processing Letters", vol.
4, NO. 5, May 1997.
[9] P. Stoica and A. Nehorai, "MUSIC, Maximum Like¬
lihood, and the Cramer-Rao Bound", IEEE Transac¬
tions on Acoustics, Speech, and Signal Processing, vol.
37, NO. 5, May 1989.
[10] S. M. Kay, "Fundamentals of Statistical Signal Pro¬
cessing : Estimation Theory" , Prentice-Hall, 1993.
The Fisher Information Matrix (FIM) for the parameters is
given by E(u>ojt), where u ; := [tr^f gr(l) . • . gT(L) r)T]
and the inverse of the CRB matrix for the parameters, after
some manipulations, is given by
CRB~'(r j)
9 L
4£{»r(0D*D<?«]
5R[U*Dg(i)]TiR[U‘U]“1SR[U*De(i)]}
47
OFDM SPECTRAL CHARACTERIZATION: ESTIMATION OF THE BANDWIDTH
AND THE NUMBER OF SUB-CARRIERS
Walter AKMOUCHE
Eric KERHERVE, Andre QUINQUIS
CELAR/TCOM/TR - BP 7
35 174 BRUZ CEDEX - FRANCE
e-mail: akmouche@celar.fr
ENSIETA - 2, rue F. VERNY
29 200 BREST - FRANCE
quinquis@ensieta.fr
ABSTRACT
2. PROBLEM STATEMENT
This paper deals with the analysis of modulated signals
in a NDA (Non Data Aided) context. Assuming the de¬
tection of an OFDM signal, our goal is to estimate the
bandwidth and the number of sub-carriers of this signal.
First, we propose an algorithm based on wavelet decom¬
position in order to estimate the bandwidth: bandwidth
is correctly estimated in 100 % of the cases with an error
lower than 8 % until SNR — 3 dB. Second, we apply the
MUSIC algorithm with decision criterion to obtain the
number of sub-carriers: the number of carriers can be
estimated with an error lower or equal to 9 % in 100 %
of the cases until SNR = 10 dB.
1. INTRODUCTION
Spectrum survey requires the estimation of the param¬
eters of the received signals. This problem has already
been studied in the case of the single-carrier modulations,
and has now to cope with new modulations types like
OFDM (Orthogonally Frequency Division Multiplexing)
which are more and more used (DAB, ADSL,...). In [2],
we proposed a method to detect OFDM signals versus
linear single-carrier modulated signals. The problem we
now want to solve is the estimation of two main param¬
eters of such a signal: the bandwidth and the number of
sub-carriers. Using the fact that the power spectral den¬
sity (PSD) of an OFDM signal has a rectangular shape,
we propose to apply a wavelet decomposition to detect
the breaking points at the beginning and at the end.
Then, we try to determine the number of sub-carriers.
Since this number is unknown, AR modelization is im¬
possible. Therefore, the MUSIC algorithm with decision
criterion seems to be well suited to solve this problem.
In section 2 we give the problem statement. Section 3 is
dedicated to the bandwidth estimation of OFDM signals,
with performances. In section 4 we give a method to ob¬
tain the number of sub-carriers. Section 5 concludes the
paper.
OFDM is a single carrier multiplexing, and can then be
expressed as a sum of single carrier modulated signals:
xm (l)
e2i*(/0+n-A f)t Tt)
(1)
where {en *} is the symbol sequence which is assumed to
be centered, i.i.d., Np the number of sub-carriers, A/ the
frequency offset between carriers, g(t) the pulse function
and P the power of the signal. T, Tu +Tg, Tu is the
“useful time” when information is sent, Tg is the interval
guard and Ts the time of the complete OFDM symbol.
We will suppose here that the interval guard is empty.
Due to the multiplexing of many single carrier signals,
the spectrum of the OFDM signal is quite rectangular
(Fig. 1). We assume to receive the complex signal r(t)
x(t) + b[t) where #(<) is the OFDM baseband signal (with
possible frequency and time offsets) and b(t) is a complex
white gaussian noise.
mm a un «cr* 0 w M .au.-tK.i-u—
Figure 1: Spectrum amplitude of OFDM signal with 32
carriers.
3. BANDWIDTH ESTIMATION
3.1. Continous wavelet decomposition (CWT)
From a signal point of view, wavelets consist of a linear
decomposition of a signal on a given waveform translated
in time and dilated or compressed in time [1], In the fre¬
quency domain, wavelet analysis is closely related to fil-
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
48
tering the data through a bank of filters having constant
surtension coefficients. The continuous wavelet trans¬
form (CVVT) maps a one-dimensional analog signal called
s(t) to a set of wavelet coefficients which vary continu¬
ously over time b and scale a:
-f CO
W(a,b) a-1/2 • J ip*(- — -)-s(t)-dt
— 00
where W(a,b) signifies “Wavelet Transform”. ip(t) is
the wavelet used in the decomposition. Equivalently the
CWT can be expressed as:
+ 00
W{a, b ) a1'2 ■ J 4>*{au) ■ S(v) ■ e2i*l/b ■ dv
— 00
with ip(v) and S(v) the Fourier transforms of ip(t) and
s(t) respectively. Wavelets must satisfy some restric¬
tions [1], the most important ones are integrability and
square integrability. Consequently, this condition implies
that if 4>{v) is a smooth function in the neighborhood of
the frequency origin then V’(O) 0, which means that
ip(t) has no DC component. Other assumptions about
wavelets can be made for convenience. One such require¬
ment is that 4>{v) 0 , for v < 0. It is also convenient
to assume that ip(v) is real for v > 0. The wavelet func¬
tions ip(^-) are used to band-pass filter the signal. This
can be seen as a kind of time-varying spectral analysis
in which scale a plays the role of a local frequency. As
a increases, wavelets are stretched and analyze low fre¬
quencies, while for small a, contracted wavelets analyze
high frequencies. The parameter b varying in time con¬
trols the desired temporal location. The scalar product
corresponds to the signal measurement s(t) in the space
drawn by all the dilated or contracted figures of unique
function ip. In order to analyze, the dilation parameter
a is given an initial large value (e.g. 1.0) and is then
decreased in regular increments to examine the signal in
more detail. We can write equivalently that the wavelet
filter function considers successively narrow section of the
signal spectrum S(v). Since spectral properties are fre¬
quently better displayed on a logarithmic frequency scale,
it is convenient to write a 2~u. With this notation in¬
tegral increments in u result in octave increments of a.
Note that a small a (i.e. large u ) corresponds to high
frequencies. A small u corresponds to an analysis of the
large scale features of s(t), and as u increases, finer de¬
tails of the signal come into focus. The function ip(t)
is the basic unshifted and undilated wavelet. It may be
chosen to answer the needs [5]. For example, in our case,
ip(t) e~ 2 +Jm( is the Morlet wavelet. An important
property of this basic wavelet is that it is concentrated
in the time and frequency domains. This means that the
time-bandwidth product is as small as possible. To sat¬
isfy ^(0) 0, one must add a correction term, but if
m > 5, this correction term is negligibly small and can
be omitted. One problem of practical interests for en¬
gineers is detection of abnormal features. Generally, we
have to use a discretization procedure since we consider
digital data. This discretization procedure consists in a
high resolution digitalization of the generating wavelet in
the time domain, truncated on its sides in order to have
a finite extent. Then, the wavelet coefficients Cj of the
time-frequency decomposition are obtained by a corre¬
lation in the time-domain of the interpolated digitized
wavelets ipj^ with the discrete signal s(n) for different
values of the dilation factor 2J and of the time shift k.
This approach presents some drawbacks such as the edge
effects due to the correlation of a finite duration signal
with a truncated infinite wavelet, the numerical approx¬
imations due to truncature,...
3.2. Bandwidth estimation method
The beginning and the end of the PSD of an OFDM
signal, called R{f), are breaking points and can be eas¬
ily detected by using a wavelet decomposition [4]. We
decide to choose the Morlet wavelet for analyzing the
PSD signal and obtain the scalogram figure of the PSD
(Fig. 2). Nevertheless, we have to admit that the esti-
Figure 2: Scalogram of the PSD of the received signal
r(t). 1024 samples. SNR = 3 dB.
mation is purely visual. For that reason, we decide to
project the resulted scalogram to obtain its frequency
marginal. Because wavelet analysis is a constant A///
transformation, we have to make the sum of energy in a
cone, instead of summing energy of column as in the case
of a bilinear time-frequency transformation. Moreover,
we can not be sure that the wavelet has the same energy
in each time-frequency logon. Consequently, we propose
to calculate the scalogram of the Dirac distribution which
has a cone shape and specifically characterizes breaking
points. Considering this scalogram, it becomes easy to
conserve only points with enough energy (i.e more energy
that a given percentage of the total energy of the signal)
and then to form a mask of description. We then obtain
the bandwidth estimation algorithm:
1. Apply the Dirac mask on the scalogram of the stud¬
ied PSD signal R(f) for each frequency localization.
2. Calculate the sum of the energy, which gives the fre¬
quency marginal of the scalogram .
3. Search for the two extrema located in the beginning
and the end of the bandwidth.
49
Two options are possible to calculate the energy of the
scalogram of Rif) in the cone of the Dirac mask. First,
we can use a binary mask, which means that energy is
equal to “1” if the point belongs to the cone, “0” oth¬
erwise. The second solution consists in using a weighted
Dirac mask which gives the real energy of each logon after
thresholding. VVe show in Fig. 3 that the second solution
leads to the right frequency marginal.
Figure 3: Frequency marginals of the scalogram in the
case of binary and weighted Dirac mask.
3.3. Results and performance
Number of good evtmartem
Onp 2.00 4 00 6.D0 8.(0 10-00
Figure 4: Noise influence: estimation performance for
different SNR, no time or frequency offsets.
Number of good eartmoUreii
ii ]0e4i
1.00
0.70
060
050
0.40
O.JO
0.20
0.10
We apply the proposed algorithm to 10,000 trials of sim¬
ulated OFDM signals. These signals are generated with
4096 samples, with 4 samples per symbol. The PSD is
evaluated by using 1024 points. We have simulated ex¬
actly the binary random sequence for SNR equal to 10,
5, 3 and 0 dB. Moreover, we have studied the effects of
bad synchronization by considering time and frequency
offsets (time offset is smaller Tu and frequency offset can
not exceed 5 % of the bandwidth of the signal).
3.3.1. Noise influence
Figure 5: frequency offset influence: estimation perfor¬
mance for different SNR, no time offset.
3.3.3. Conclusion concerning the method
The proposed method is efficient until SNR 3 dB,
even in the case of time or frequency offset. By using
the PSD of the received signal, all phase perturbations
can be removed. Until SNR 3 dB, we can conclude
that the bandwidth is correctly estimated in 100 % of the
cases with an error lower than 8 %.
Fig. 4 shows the results of bandwidth estimation for dif¬
ferent SNR. The proposed algorithm permits to deter¬
mine the bandwidth with a precision lower than 4 % for
97 % of the signals when SNR 3 dB. But we can ob¬
serve a strong degradation of the performance as SNR
goes to 0 dB.
4. ESTIMATION OF THE NUMBER OF
SUB-CARRIERS
4.1. Theoretical covariance matrix
3.3.2. Time and frequency offset influence
Fig. 5 shows results obtained for different SNR in the
case where the frequency offset S f0 is non zero. The new
scalogram is quite a translated version of the original
scalogram with lengh 5fo. Consequently, the bandwidth
remains the same and the performances are still good.
The time offset 8t0 is equivalent to a new phase for the
signal. Since we evaluate its PSD, phase has not influence
anymore and then the performances are strictly the same
as in the case Sto 0.
In this problem, we are receiving one signal which is made
of Np components. Then, we compute the coefficients of
the covariance matrix called R. For each time-delay rn
in the interval [0 ; Np — 1], the covariance term can be
expressed by:
1 N*
r(Tn) n -r ' X! *(?) x*(9-rn) (2)
p Tn ,=rn+ 1
where Ne denotes the number of samples of the received
signal. Moreover we can notice that the estimator is a
50
non-biased estimator. In the case where r„ 0, we have:
K°)
P 9 = *
5Zk(?)l2 + ^fc
9=1
where a\ is the variance of noise. If rn / 0, we have:
r{rn)
~~ ■ *(«) ■**(?- r»)
Np - Tn
N,
9=l+r»
*2(?)
-2<7rA/r„
<7=1 + T„
and then we consider un 27rA frn, depending on r„.
case of fading, the contributions of some sub-carriers be¬
come lower and the breaking point is impossible to find.
Another solution is to use a decision criterion: Akaike’s
or Rissanen’s criterion. Akaike’s criterion is more suited
since it tends to overestimate the number of sources if
the signal is oversampled, which could be helpful in the
case of fadings. Moreover, this method is efficient only if
there is two noise contributions (at least). That is that
the number of correlation terms must be at least equal
to (Np + 1). The problem is that Np is unknown and has
to be estimated. The proposed solution is to start the
algorithm with an a priori number of sub-carrier and to
iterate this process until one eigenvalue corresponding to
noise or a breaking point appears.
4.2. Proposed algorithm
We can then form the covariance matrix as:
R :
r(0)
r*(l)
r(l)
r(0)
^r*(Np — 1) r*(JVp-2)
r(N„-2) r(Np-l)\
r(Np~ 3) r(Np — 2)
r*(l)
r(0)
Considering the value of r(rn) in the case where rn 0
or t„ / 0, thix matrice can also be written as:
/
R =
E
q=sl
9 — 1
"e
l Yxlei(Np~ °wn
Ne
]T x2q • e*(NP~1)u'»
9 = 1
Y ■ e*(Np-2)w«
9 = 1
Nz
E *;+*?
/
This matrix is a symetrical matrix and its form is the
same as in the cases for which MUSIC algorithm is used.
Then it can be diagonalized by using eigenvalue decom¬
position [6]. After the diagonalization process, we know
that the autocorrelation matrix becomes:
/Ai 0...
0 A2
0 .
... Ox
0 ...
. . . \Np 0
0 ...
... 0 *1
0 ...
0
\ 0 ...
0 of)
where Aj, A2, . . . , \np are the eigenvalues due to the
contribution of the useful signal plus noise. Normally,
At > Vi € (1, 2, . . . Np}. We can notice that the ma¬
trix contains Np eigenvalues which are bigger than the
noise variance, and then that the number of sub-carriers
can be deduced.
Many solutions are possible to determine which values
are due to the contribution of the sub-carriers. As the
channel has been surveyed before the signal started, we
can assume that the variance of noise has been estimated,
with of course incertitude. A second solution is to rep¬
resent the eigenvalues on a same diagram by increasing
value order and to detect a breaking point. But, in the
The first algorithm we propose is the following:
1. Fixe a priori the size of the matrix: Ne.
2. Using equation 2, compute the Ne autocorrelation
terms and form correlation matrix.
3. Diagonalize the matrix and apply Akaike’s critrion.
If the number of sub-space (i.e. of sub-carriers) is
equal to Ne, go to step 1 and do Ne 2 • Ne.
4.3. Results
We apply the proposed algorithms to simulated OFDM
signals. We simulate 10,000 OFDM signals using 10,000
trials to generate the corresponding symbols. Each signal
is generated with 50,000 samples normally and contains
64 sub-carriers. The frequency offset is limited to 10% of
the bandwidth of the signal. The channel is the urban
channel (COST 207) in order to compare decision criteri-
ons. We apply MUSIC algorithm with Akaike’s criterion
(except in figure 8).
4-3.1. Noise influence
In the first case, we are looking for noise influence. We
generate OFDM signals for different signal-to-noise ra¬
tios (20, 10 and 5 dB). We can notice on Fig. 6 that until
10 dB performances are quite good, but become poor
for 5 dB and less. Then, we study the influence of the
number of signal samples since we use estimators of auto¬
correlation terms. SNR is fixed to 20 dB, and the signals
are tested with respectively 50,000, 40,000 and 30,000
samples. As forecasted, the performances decrease with
the number of samples. Nevertheless, 50,000 samples are
enough to obtain good performances (Fig. 7). Lastly, we
compare Rissanen’s and Akaike’s criterion in the case of a
signal with 50,000 samples and SNR = 20 dB and 10 dB.
51
Figure 6: Noise influence in the estimation of the number
of sub-carriers.
Cvmutaic
Bm
O.OO J «) 10.00 15.00
Figure 7: Influence of the number of points in the esti¬
mation of the number of sub-carriers. SNR=20 dB.
Since it tends to overestimate the dimension of the sig¬
nal sub-space, Akaike’s criterion is quite better than the
Rissanen’s one (Fig. 8).
4.4. Conclusion concerning the method.
This method is quite efficient to estimate the number of
sub-carriers until SNR = 10 dB and for 50,000 samples
(that means about 1500 OFDM symbols). Akaike’s cri¬
terion is more appropriated than Rissanen’s one, but we
should test the “Minimum Description Length” criterion.
5. CONCLUSION
The proposed methods to estimate the bandwidth and
the number of sub-carriers are quite efficient for a few
samples and low SNR ( lower than 10 dB). Concerning
the bandwidth estimation, we obtain a correct estimation
in 100 % of the cases with an error lower than 8 % until
SNR = 3 dB. Concerning the estimation of the number of
sub-carriers, we obtain a correct estimation in 100 % of
the cases with an error lower than 9 % until SNR = 10 dB.
The performances can be improved using denoising algo¬
rithms [3] and compared with time-domain methods that
we are currently developping [4], This work completes
our detection algorithm and can be used for coming ap-
Akaike'.i criterion GO dB)
Cumulate Error
0.00 5.00 10.00 15.00
Figure 8: Influence of the decision criterion on the estima¬
tion of the number of sub-carriers. 10,000 trials, 50,000
samples, SNR=20 and 10 dB, urban channel (COST
207).
plications of synchronization and equalization.
REFERENCES
[1] A. Cohen “Ondelettes et traitemcnt
numfirique du signal” ed. MASSON, 1992,
205 pages.
[2] W. Akmouche “Detection of multi-carrier mod¬
ulations using 4th-order cumulants.” Proc.
of ike MILCOM, session 15, Atlantic City, 01-
03/11/1999.
[3] E. Kerherve, W. Akmouche, A. Quinquis “Wavelet
and noise reduction: application to the time
features estimation for OFDM signals.” Proc.
of the ICSPAT , Orlando (Florida), USA, Oct. 1999.
[4] W. Akmouche, E. Kerherve, A. Quinquis “Estima¬
tion of OFDM signal parameters: time pa¬
rameters.” submitted to Globecom 2000, Nov. 2000.
[4] J.-C. Pesquet, H. Krim, H. Carfantan, J. G. Proakis
“Estimation of noisy signals using time-
invariant wavelet packets” Proc. of the IEEE,
1993, pp. 31-34.
[5] A. Teolis “Computational signal processing
with wavelets” Proc. of the IEEE, 1993, pp. 31-34.
[6] E. H. Attia “Efficient computation of the MU¬
SIC algorithm as applied to a low-angle eleva¬
tion estimation problem in a severe multipath
environment” Ed. Birkhauser, 1998, 324 pages.
52
BLIND SOURCE SEPARATION OF NONSTATIONARY CONVOLUTIVELY
MIXED SIGNALS
Brian S. Krongold and Douglas L. Jones
Department of Electrical and Computer Engineering & Coordinated Science Laboratory
University of Illinois at Ur bana- Champaign
Urbana, IL 61801
ABSTRACT
Many algorithms for blind source separation (BSS) have
been introduced in the past few years, most of which
assume statistically stationary sources as well as instan¬
taneous mixtures of signals. In many applications, such
as separation of speech or fading communications sig¬
nals, the sources are nonstationary. Furthermore, the
source signals may undergo convolutive (or dynamic)
linear mixing, and a more complex BSS algorithm is re¬
quired to achieve better source separation. We present
a new BSS algorithm for separating linear convolutive
mixtures of nonstationary signals which relies on the
nonstationary nature of the sources to achieve sepa¬
ration. The algorithm is an on-line, LMS-like update
based on minimizing the average squared cross-output-
channel-correlations along with unity average energy
output in each channel. We explain why, for nonsta¬
tionary signals, such a criterion is sufficient to achieve
source separation regardless of the signal statistics.
1. INTRODUCTION
The separation of multiple unknown sources from multi¬
sensor data has many applications, including the iso¬
lation of individual speech signals from a mixture of
simultaneous speakers (as in video conferencing or the
often-cited “cocktail party” environment), the elimi¬
nation of cross-talk between horizontally and vertically
polarized microwave communications transmissions, and
the separation of multiple cellular telephone signals at a
base station. In the past decade or so, a number of sig¬
nificant methods have been introduced for blind source
separation, of which we review a few of the most popu¬
lar here. One of the earliest and most effective methods
(yet relatively unknown in some circles) is a constant-
modulus-based method published in 1985 by Treichler
and Larimore [1]. This method achieves simultaneous
This work was supported by the National Science Founda¬
tion, grant no. CCR-9979381.
separation and equalization by minimizing the devia¬
tion of the separated output magnitudes from a fixed
gain. This method is very simple and convenient and
works well even for non-constant-modulus signals with
a sub-Gaussian kurtosis (which includes most commu¬
nications signals).
Jutten and Herault introduced one of the most pop¬
ular methods [2]. This method works well in many ap¬
plications, particularly cross-talk situations in which a
relatively modest amount of mixing occurs. For more
challenging scenarios, the existence of multiple min¬
ima and misconvergence of the widely used Jutten-
Herault algorithm has been examined in the literature
[3]-[4] . Methods for non-Gaussian sources have also
been developed, including [5] and others1. More re¬
cently, methods based on second-order statistics (and
which can thus work even for Gaussian sources) have
been introduced. A method by Belouchrani, et al. can
separate stationary Gaussian sources with different au¬
tocorrelation statistics [6].
In many applications of blind source separation,
the received signals are nonstationary. Nonstationar-
ity may arise either from the source signals themselves
(such as speech), or from channel impairments (such
as fading in wireless communications channels). Most
techniques for blind source separation assume station-
arity of the signals and depend on reliable estimation
of second-order or higher-order statistics. These meth¬
ods may have difficulty when applied to nonstationary
signals.
Several methods developed explicitly for nonsta¬
tionary source separation have been published recently.
Belouchrani and Amin have developed a time- frequency
extension of the method in [7] for nonstationary sources,
and Parra, et al. have developed another method based
on frequency decomposition of several successive blocks
of time [8]. While these methods appear effective, and
JIt should be noted here that the CMA-based method by
Treichler and Larimore also depends on the sub-Gaussianity of
the sources.
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
53
the latter can also separate convolutive mixtures, they
are block-based methods requiring somewhat sophis¬
ticated and expensive processing. Matsuoka, et al.
present an on-line, adaptive extension of the Jutten-
Herault method which, somewhat like the method we
proposed in [9], attempts to minimize the average cross¬
correlation between separated channels while normal¬
izing the output energy [10].
In various situations, convolutive (or dynamic) mix¬
ing occurs rather than instantaneous mixing. This com¬
plicates the BSS problem and requires a more sophisti¬
cated and computationally complex solution. Although
the convolutive mixture problem is not as widely pub¬
lished as the instantaneous problem, methods for solv¬
ing the problem are discussed in [11]— [12] .
In this paper, we extend our work in [9] to con¬
volutive mixtures to obtain a method for blind source
separation of nonstationary, convolutively mixed sig¬
nals which requires only nonstationarity and indepen¬
dence of the sources to achieve separation. An on-line,
LMS-like algorithm is derived which achieves separa¬
tion while normalizing the average energy of each out¬
put channel. This simple algorithm also offers tracking
capability for time-varying convolutive mixtures. The
optimization criterion is presented in the second sec¬
tion of this paper, the adaptive algorithm is derived in
the third section, and simulations which illustrate its
performance are presented in the fourth section. Some
perspectives on the results are discussed in the final
section.
2. A NONSTATIONARY CONVOLUTIVELY
MIXED SOURCE SEPARATION
CRITERION
The general source separation problem with convolu¬
tive mixtures can be described as
n
x(n) = ^ A (n - m)s(m), (1)
m— — oo
where s(n) is a vector of M zero-mean, statistically
independent source processes at time-sample n, x(n) is
a vector of N sensor measurements, N > M, and A (n)
is an M x N mixing filter matrix. The goal of blind
source separation is to determine an N x M de-mixing
matrix of filters B(n) for n = Q ... L — 1 , which, when
applied to the received sensor data as in
L- 1
y(n) = 51 B(m)x(n - m)i (2)
m= 0
recovers (separates) the individual sources up to an un¬
known permutation and unknown channel gains, which
cannot be uniquely determined without additional in¬
formation [10].
An important problem with convolutive mixtures
is that even complete separation may not recover the
exact original x(n) source signals. Due to the blind
nature of the problem and the memory introduced by
the convolutive mixing, it may be impossible to ob¬
tain the true source signals, and instead filtered ver¬
sions may result without further assumptions on the
source signals. It is for this reason that convolutive-
mixture-BSS-algorithm performance can be viewed, as
in [11], by how well a system separates two sources
without any regard to how the output signals compare
to their unfiltered source versions. A way to quantify
this separation performance is to see how well (statis¬
tically) uncorrelated the output signals are. In this
paper though, our methods perform joint separation-
equalization, and this should work well for a certain
class of source signals. Our simulations compare the
output signals to the original source signals and quan¬
tify the performance in terms of signal-to-interference
ratio (SIR).
It has been observed in many papers on blind source
separation that a necessary condition for the separa¬
tion of zero-mean, statistically independent sources is
that the cross-correlations of the output channels equal
zero. However, this is not a sufficient condition, as is
well known (see [9] for an example demonstrating this).
For sources with fixed variances, an ambiguity exists as
there are an infinite number of demixing matrices which
obtain zero cross-channel correlation. For any arbitrary
pair of variances, the classes of decorrelating matrices
are different for different source variances, and only a
true separating solution yields zero cross-channel cor¬
relation for all variance combinations. This is the key
insight on which nonstationary blind source separation
algorithms are based. In effect, these methods take
multiple snapshots of the short-time cross-correlations
at different times, and by minimizing all of these si¬
multaneously, they exploit the changes in the relative
channel variances to find a truly separating solution.
This paper uses the same basic insight, but proposes
a new criterion for exploiting it which leads to a par¬
ticularly simple and convenient algorithm. We propose
to minimize the following criterion:
L—l MM M
E + *S(*wvi(°) ~ ^
n=0..?L-l 1=0 i=l i=\ *=1
L J
(3)
where at time n
fyiyj{h n) = ^2 h(k)Vi(n ~k~ l)vAn ~ k ) (4)
k
54
and h(k) is a lowpass averaging filter for computing
a short-term estimate of the cross-correlation of out¬
put channels j/j and yj at time n and lag l. The first
term in the criterion is to minimize the average squared
magnitude of the short-term cross-correlations for the
first L lags of the output signals (which, as discussed
above and in [10], should only be achieved for non¬
stationary signals by a separating solution), while the
second term demands that the output signals in each
channel have unit energy on average. In a sense, the
second criterion adds a signal normalization feature to
the algorithm, but as was shown in [1], this CM A cri¬
terion has the ability to jointly separate and equalize
sub-Gaussian signals. In instances where the source
signals are sub-Gaussian (or one or more of them are),
the added CMA criterion greatly aids in separating as
well as equalizing in order to obtain closer estimates of
the original x(n) source signals.
3. ADAPTIVE ALGORITHM
M
i = 3
+ 2X(fyp,yp(0) — l){fyptXq(k)) (7)
We now derive efficient recursive updates for the
short-term correlation estimates for a convenient form
of the averaging filter. For computational efficiency,
we select a first-order HR averaging filter with impulse
response
h(k) = aku{k) (8)
where u{k) is the unit step function and 0 < a < 1.
With this form, the correlation statistics can easily be
updated recursively according to
fyiVj (/; n + 1) = afyiVj (1; n) + yi(n - \l\)yj(n), (9)
and similarly
fyixj(kn + 1) = afyiXj(l\n ) + yi(n - \ l\)xj(n) (10)
There are many ways to construct a numerical algo¬
rithm based on the above criterion for blind nonsta¬
tionary source separation, yielding different tradeoffs
in terms of computational efficiency, convergence rate,
block-based or adaptive forms, etc. However, in many
applications, a simple, adaptive method which can track
slow variations in the mixing parameters is desired. We
derive here a stochastic gradient (LMS-like) algorithm
which has these characteristics.
Many of the most successful adaptive algorithms are
based on a stochastic gradient update using an instan¬
taneous approximation to the expectation in the opti¬
mization criterion. For the optimization of the demix¬
ing matrices, B(Z)’s, a stochastic gradient update takes
the form
Bn+1(0 = B„(/)-MVn(0 for l = 0 . . . L — 1. (5)
where
v„(0
d
dbpq{l)
M M
EE ^ ViVi (0 + yiViify 1)
i~ 1 j= l
j^i
M
c
i= 1
(6)
where p and q are the row and column indices of the
gradient matrix. Note the use of the instantaneous
value at time n of the error function in (3) in the gra¬
dient computation. The (p, g)th element of the gradient
matrix at lag l can easily be shown to be
L—l
^7pq,n(l) — 2
1=0
M
' ^Vv,l li (0 ?xq,yj (l — k) +
j= i
for all lags l which are required for the algorithm. This
completes the following simple recursive algorithm for
nonstationary blind source separation.
1. Compute output according to (2).
2. Update short-time correlations using (9) and (10).
3. Compute separation filter gradient using (7).
4. Update separation filters as in (5).
5. Go back to step 1.
The complexity of the algorithm in the instanta¬
neous mixture case was shown in [9] to be 0(M2N).
Extension to the convolutive mixture case yields in¬
creased complexity by a factor of L2, where L is of
course a chosen parameter which can be used to trade
off complexity and quality of separation.
> 4. SIMULATIONS
Several simulations have been performed to confirm the
efficacy of the proposed method. For the following sim¬
ulation with two sources and sensors, the mixing ma¬
trices are:
1 -.5
.7 1.3
.35 -.3
-.2 .6
-.2
.15
(11)
where the first matrix represents zero lag, the second
represents a lag of one, and the third represents a lag
of two. The nonstationary sources, shown in Figure
55
Figure 1: First 4000 samples of the nonstationary
sources used in the simulation
1, are binary random signals multiplied by lowpass fil¬
tered Gaussian signals, and may be considered a crude
approximation to communications signals undergoing
fading. Three mixing scenarios are simulated by con¬
sidering the cases of A as above, only the first two ma¬
trices of A, and only the first matrix of A (ie. instan¬
taneous mixture). These mixtures are tested against
our source separation algorithm with L values ranging
from 1 to 4, resulting in 12 different simulations.
Our BSS algorithm was tested in these 12 simula¬
tions and SIRs were computed for each of these cases,
as well as for the case where no source separation is
applied2. When our BSS algorithm is applied, output
scaling is needed as BSS can only recover up to an un¬
known scale value. Since the scaling changes over time
as the algorithm adapts, the signal was normalized by
an approximate best-fit scale factor every 100 samples.
A length-10,000 sample period was evaluated after suf¬
ficient convergence (using small values of fi) to obtain
the resulting SIR values.
Table I shows the simulation results when only the
first matrix in A is used for mixing, which results in
purely instantaneous mixing. The results show excel¬
lent performance for all cases of L, but one feature is
that performance degrades slightly with increasing L.
The reason for this is because only L = 1 is needed to
solve this problem, and by adding unneeded, adaptable
coefficients, performance suffers slightly due to misad-
justment in the stochastic gradient algorithm for the
non-instantaneous coefficients.
Table II shows the simulation results for length-2
mixing (ie. only the first two matrices of A are ap-
2In this case, the desired source signal is chosen according to
which source is dominant in the mixture.
Table 1: Length-1 (Instantaneous) Mixture Results
BSS Type
SIR in dB
Source 1
Source 2
None
6.8954
6.5736
L = 1
36.1091
31.7012
L = 2
33.8167
32.2873
L = 3
32.0851
28.7450
L = 4
29.7797
28.9522
Table 2: Length-2 Mixing Results
BSS Type
SIR in dB
Source 1
Source 2
None
4.8529
4.6307
L = 1
13.3890
6.2799
L — 2
22.6628
11.3558
L = 3
27.6702
16.3054
L — 4
29.3789
21.2188
plied). The results clearly show a performance degra¬
dation compared to the instantaneous mixture results
as the memory increases the difficulty of separation. It
can be seen that the L = 1 case does a fairly poor job
of signal separation, and increasing L results in bet¬
ter SIR values as expected. Another observation is the
imbalance of SIR performance between the two source
signals. This is a function of the mixing filters.
Table III shows the simulation results for length-
3 mixing using A as in (11). The results show even
further degradation than the length-2 mixture case as
the increased mixing is more difficult to recover from.
Again, the performance increases with the demixing
filter length, L. Further gains could be obtained by us¬
ing a larger L , but this comes at the expense of greater
complexity of the system (proportional to L2) as well
as much slower convergence.
Table 3: Length-3 Mixing Results
BSS Type
SIR in dB
Source 1
Source 2
None
4.4030
4.3170
L = 1
9.2718
5.7749
L = 2
11.0878
9.3144
L = 3
13.6907
12.4754
L = 4
15.8287
15.0418
56
5. CONCLUSIONS
Effective blind source separation can be achieved by ex¬
ploiting nonstationarity of the sources. Furthermore, it
is possible to separate convolutively mixed signals with
the algorithm. This paper clearly shows performance
gains can be made over an instantaneous mixture algo¬
rithm in the presence of convolutive mixtures.
Nonstationary blind source separation algorithms
appear particularly relevant for practical applications
because many sources of interest, such as speech or fad¬
ing signals, exhibit nonstationarity but may not oth¬
erwise present features (such as non-Gaussian statis¬
tics or different auto-correlation structure) required by
other methods.
In comparison with other nonstationary blind source
separation algorithms, the method proposed here re¬
sults in a simple on-line stochastic gradient algorithm
requiring only multiplications and additions, which are
efficiently implemented in signal processing hardware.
It appears to exhibit the traditional characteristics of
LMS-like algorithms including robustness and numeri¬
cal stability, the ability to track slow variations in the
environment, and relatively slow convergence.
The computational complexity of the algorithm is
0(NM2L2). That is, the cost is linear in the number
of receivers, but quadratic in the number of sources
and the demixing filter lengths. For many applications,
these parameters are very small, and the algorithm is
very efficient. For larger values of L, the computational
cost may be the limiting factor in a tradeoff between
performance and complexity.
REFERENCES
[1] J. R. Treichler and M. G. Larimore, “New pro¬
cessing techniques based on the constant modu¬
lus adaptive algorithm,” IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. 33,
pp. 420-431, April 1985.
[2] C. Jutten and J. Herault, “Blind separation of
sources, part I: An adaptive algorithm based
on neuromatic architecture,” Signal Processing,
vol. 24, pp. 1-10, July 1991.
[3] P. Comon, C. Jutten, and J. Herault, “Blind sep¬
aration of sources, part II: Problem statement,”
Signal Processing, vol. 24, pp. 11-20, July 1991.
[4] Y. Sorouchyari, “Blind separation of sources, part
III: Stability analysis,” Siqnal Processing, vol. 24,
pp. 21-29, July 1991.
[5] J.-F. Cardoso, “Iterative techniques for blind sepa¬
ration using only fourth-order cumulants,” in Sig¬
nal Processing IV - Theories and Applications,
Proceedings fo EUSIPCO-92, Sixth European Sig¬
nal Processing Conference, vol. 2, pp. 739-742,
1992.
[6] A. Belouchrani, K. A. Meraim, J.-F. Cardoso, and
E. Moulines, “A blind source separation technique
using second order statistics,” IEEE Transactions
on Signal Processing, vol. 45, pp. 434-444, Febru¬
ary 1997.
[7] A. Belouchrani and M. G. Amin, “Source separa¬
tion based on the diagonalization of a combined set
of spatial time-frequency distribution matrices,”
in Proceedings of the IEEE International Confer¬
ence on Acoustics, Speech, and Signal Processing,
ICASSP - 97, (Germany), April 1997.
[8] L. Parra, C. Spence, and B. de Vries, “Convolutive
blind source separation based on multiple decor¬
relation,” in Proceedings of 1998 IEEE Workshop
on Neural Networks for Signal Processing, (Cam¬
bridge, UK), September 1998.
[9] D. L. Jones, “A new method for blind source sep¬
aration of nonstationary signals,” in Proceedings
of 1999 IEEE International Conference on Acous¬
tics, Speech, and Signal Processing, ICASSP - 99,
(Pheonix, AZ, USA), March 1999.
[10] K. Matsuoka, M. Ohya, and M. Kawamoto, “A
neural net for blind separation of nonstationary
signals,” Nueral Networks, vol. 8, no. 3, pp. 411-
419, 1995.
[11] U. A. Lindgren and H. Broman, “Source separa¬
tion using a criterion based on second-order statis¬
tics,” IEEE Transactions on Signal Processing,
vol. 46, pp. 1837-1850, July 1998.
[12] H. L. N. Thi and C. Jutten, “Blind source separa¬
tion for convolutive mixture,” Signal Processing,
vol. 45, pp. 209-229, 1995.
57
A VERSATILE SPATIO-TEMPORAL CORRELATION FUNCTION FOR
MOBILE FADING CHANNELS WITH NON-ISOTROPIC SCATTERING
A. Abdi, M. Kaveh
Dept, of Elec, and Comp. Eng., University of Minnesota
Minneapolis, Minnesota 55455, USA
ABSTRACT
For the analysis and design of adaptive antenna arrays in mobile
fading channels, we need a model for the spatio-temporal
correlation among the array elements. In this paper we propose a
general spatio-temporal correlation function, where non-isotropic
scattering is modeled by von Mises distribution, an empirically-
verified model for non-uniformly distributed angle of arrival. The
proposed correlation function has a closed form and is suitable
for both mathematical analysis and numerical calculations. The
utility of the new correlation function has been demonstrated by
quantifying the effect of non-isotropic scattering on the
performance of two applications of the antenna arrays for
multiuser multichannel detection and single-user diversity
reception. Comparison of the proposed correlation model with
published data in the literature shows the flexibility of the model
in fitting real data.
1. INTRODUCTION
In recent years the application of adaptive antenna arrays (smart
antennas) for cellular systems has received much attention [I],
since they can improve the coverage, quality, and capacity of
such systems by combating interference, fading, and other
undesired disturbances. An adaptive array can be defined as an
adaptive spatio-temporal filter, which takes advantage of both
time-domain and space-domain signal characteristics. Efficient
joint use of time-domain and space-domain data demands a
generalization of conventional communication theory and signal
processing techniques to spatial and temporal communication
theory [2] and space-time signal processing techniques [3].
Needless to say, new spatio-temporal channel models have to be
developed as well. Since the second-order statistics of the
channel characterize the basic structure of stochastic mobile
channels, we need a spatio-temporal correlation function to study
the basic impact of the random channel on the performance of
space-time solutions, including the adaptive antenna arrays.
In this paper we present a flexible and versatile parametric
correlation function for the mobile station (MS) (similar results
can be obtained for the base station (BS) as well, as we see in
Section 4). We do this by generalizing the spatio-temporal
correlation function in [4], originally derived for an isotropic
scattering scenario where the MS receives signals from all
direction with equal probability, to the non-isotropic scattering
case. Note that isotropic scattering at the MS corresponds to the
uniform distribution for the angle of arrival (AOA) at the MS.
However, empirical results have shown that due to the structure
of the mobile channel, the MS is likely to receive signals only
from particular directions (see [5] and references therein). In
other words, most often the MS experiences non-isotropic
scattering, which results in a non-uniform distribution for the
AOA at the MS. In [5] it has been shown that the application of
von Mises distribution for the AOA at the MS yields an easy-to-
use and closed-form expression for the temporal (or equivalently,
spatial) correlation function. This correlation function has
exhibited very good fit to measured data [5].
In the sequel we derive a new spatio-temporal correlation
function where non-isotropic scattering is modeled by the von
Mises distribution. To show the significant effect of non¬
isotropic scattering on the performance of smart antenna systems
employing space-time data, we study the performance of an
antenna array multiuser detector equipped with a channel
estimator, operating in a Rayleigh fading channel. As a simpler
example where only space data are employed, we also investigate
the impact of non-isotropic scattering on a multi-element receiver
working as a maximal ratio combiner (MRC) in a Rayleigh
fading channel. In both examples we show how the proposed
spatio-temporal correlation function helps us in quantifying the
effect of the fading channel on the performance of antenna
arrays, in the realistic scenario of non-isotropic scattering. The
paper concludes with a comparison of the proposed correlation
model with the published correlation data, collected by a BS-
mounted array.
2. A NEW CORRELATION FUNCTION
Consider a linear uniformly-spaced antenna array shown in [4,
Fig. 2], mounted on a MS. Let rm(t) denote the complex
envelope at the mlh element from left. Then the normalized
correlation function between the complex envelopes of the mth
and the nth antenna elements, defined by
(z) = E[rm (t)r’(t + t)\/E[\ rm ( t ) |2 J , can be derived from [4]:
(U = E[cxp{j2rfd t cos(0 -a) + j{m - n)2it(d/A) cos ©J, (1)
where E denotes mathematical expectation, j = V-T , fd is the
maximum Doppler frequency, 0 stands for the AOA, a
represents the direction of the motion of the MS with respect to
the horizontal axis counterclockwise, d is the spacing between
any two adjacent antenna elements, and A is the wavelength.
Now we consider the von Mises probability density function
(PDF) for the random variable 0 :
p0m=
exp|x'cos(0-0p))
2 ttI0(k)
6 s {-n, k) ,
(2)
where /0(.) is the zero-order modified Bessel function,
br £ \-n,n) accounts for the mean direction of AOA, and
k > 0 controls the width of the AOA distribution [5]. For k =0
(isotropic scattering) we have pe(6) = 1/(2 n) , while for k =<*>
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
58
(extremely non-isotropic scattering) we obtain
pe(6) = bib -bp) , where <5(.) is the Dirac delta function. By
calculating the expectation in (1) according to (2) we obtain:
A. (*)&.(*)- (3)
10^Jk2 - x 2 - y 2 -2xycosor+ j2K[xcos(a-9p)+ ycosdp] j,
where x = 2rfdr and y = 2n(m - n)d/A . With x =0 , (3)
reduces to Lee’s spatio-temporal correlation function
J0(ijx2+y2 + 2xycosa) in [4, Eqs. (42)-(43)] for isotropic
scattering, where /„(.) is the zero-order Bessel function. For
m = n = 1 (single antenna), Lee’s result further simplifies to
Clarke’s classic temporal correlation function J0(x) [6, p. 40,
Eq. (2.20)]. For a single antenna experiencing non-isotropic
scattering and a = 0 , (3) reduces to the temporal correlation
function /0(y*2 - x2 + /2k xcosbp )//0(x) derived in [5, Eq.
(2)] (this correlation function has shown very good fit to
measured data [5]).
In comparison with the existing spatial correlation functions for
antenna arrays [7], our proposed model in (3) has the main
advantage that it includes both space and time dimensions in a
single mathematically-tractable closed-form expression, flexible
for fitting to array data, studying the performance of various
array-based techniques [8] for different applications in fading
channels with the realistic assumption of non-isotropic
scattering, optimizing array configurations [9], etc..
3. TWO ARRAY APPLICATIONS
In this section we use the proposed model in (3) for two array-
based applications. In the first one we need a spatio-temporal
correlation function, while for the second one a spatial-only
correlation function is needed. In array applications, the need for
a spatio-temporal correlation function also appears in
conjunction with such important fading characteristics as level
crossing rate and average fade duration [4] [10], which due to
space limitations we do not address here.
waves from all directions with equal probability, while in Fig. 2,
where x, = k2 = 10 , the MS receives directional waves from two
specific directions (the beamwidth in each direction is equal to
BW = 2/Vk =36° [5]). Suppose the first user is the desired
user, while the second one is the interfering user. The MS moves
from left to right ( a = 0 ) and the users travel at speeds such that
the desired user has the maximum Doppler frequency fdJ = 0.1
Hz, while the interfering user has the maximum Doppler
frequency fd,2 = 0.05 Hz. Assume the correlation coefficient
between the users’ signature waveforms is pl2 = 0.5 , and the
MS uses only the past two values (1 = 2) of matched filter
outputs and bit decisions for fading estimation and bit detection
in the presence of Rayleigh fading and zero-mean additive white
Gaussian noise with variance o 2 . Suppose both users have
(equal) unit power. Let us define the signal-to-noise ratio (SNR)
as y = l/o 2 . For d = 0.3/ and A , the asymptotic efficiency of
the desired users, rj] , calculated using the equations given in
[12], is plotted in Figs. 3 and 4 versus SNR, assuming
x, = k2 = 0 and x, = x2 = 10 . According to both figures, as k
increases (more directional reception), the efficiency of both
detectors increases significantly (which is good news). However,
the difference between the detectors efficiencies increase as well,
which implies that choosing the decorrelating detector, due to its
lower complexity, introduces a significant loss in efficiency when
we have non-isotropic scattering. Hence, we need to develop new
suboptimum low-complexity detectors with efficiencies
comparable with the optimum detector, in channels with
directional reception.
3.2 Average Bit Error Rate of a Single-User
Multichannel Array Detector
Assume that in Figs. 1 and 2, we have user one only ( K = 1 ),
and bp= 0 . Moreover, both the MS and the user are stationary
(fd = 0 ). The user sends data using binary phase shift keying
(BPSK) modulation scheme, and the MS is equipped with a two-
branch ( M = 2 ) maximal ratio combiner (MRC). The average
bit error rate (BER) in this case is given by [13, Eq. (12)]:
3.1 Efficiency of Two Multiuser Multichannel
Array Detectors
(l +p)
Y(} + p)
i + y(\ + p)
.(4)
For code division multiple access (CDMA) signals, recently two
array-based multiuser detection schemes with imperfect estimates
of the fading channel were investigated in [11]: the decision-
directed detector (with more complexity) which is optimum, and
the decorrelating detector (with less complexity) which is
suboptimum. In terms of the asymptotic efficiency, it has been
proven that the decision-directed detector is superior. However,
the decorrelating detector is simpler to implement. So, it is of
interest to determine how much these two detectors are different
in terms of asymptotic efficiency. Here, by a simple example [12,
p. 107 and p. 1 17] we show that the answer strongly depends on
the mode of scattering, which affects the correlation function of
the complex envelope in the fading channel.
Assume that the MS has a two-element antenna ( M = 2 ), and
there are two mobile users (K = 2) according to the
configuration shown in Figs. 1 and 2 ( 6p l = 0 , 6p 2 = n )■ In Fig.
1 we have x, = x2 =0, where the MS receives scattered plain
where p =| <pn (0) | . In Figs. 5 and 6 we have plotted Pb (y )
versus y for d = 0.3X and A , respectively. As we expect, the
average BER increases as x increases, because it results in more
correlation between the branches. Of course, a larger d can
reduce the amount of correlation between branches, resulting in
smaller average BER (compare Figs. 5 and 6).
4. COMPARISON WITH DATA
Although the application of antenna arrays in both MS and BS is
advantageous, in this section we focus on BS since the
application of arrays at the BS is more common (practical
constraints usually restrict the use of an array of antennas at a
MS). For statistical characterization of narrow histograms of the
AOA of waves impinging the BS [14] [15] (which gives rise to
the non-uniform distribution of power versus the azimuth angle
[16]), three different PDF’s are used so far in the literature:
cosine [17], Gaussian [18], and truncated uniform [19]. All these
59
PDF’s are considered primarily for studying the effect of non-
uniformly distributed AOA on the spatial correlation among the
array elements at a BS. With appropriate choice of parameters,
these three PDF’s can resemble visually the narrow histograms of
the AOA at the BS (although the truncated uniform PDF is less
likely to do that because the empirical histograms are usually
bell-shaped [14] [15] and decay to zero not as abruptly as a
truncated uniform PDF). So, mathematical convenience seems to
be the main concern in choosing a PDF for the AOA, among
empirically-acceptable candidates. From this point of view, none
of these three PDF’s are able to provide a simple closed-form
solution (in terms of known mathematical functions) for the
correlation between the complex envelopes of the array elements
(which is a basic quantity in array-related studies). For the
Gaussian PDF only approximate results can be found [18] [20],
and for the truncated uniform PDF, closed-form results can be
derived only for inline and broadside cases [21] (the cosine PDF
is less likely to yield a closed-form answer because of the special
integral that has to be solved). On the other hand, as we see in
the sequel, von Mises PDF yields a simple and compact
expression, given in (5), which is basically the same as (3). This
makes the von Mises PDF a very suitable model.
Comparison of the Gaussian PDF with the histograms of AOA
data has shown reasonable agreement [15] [22], This is a good
empirical support for the von Mises PDF because for large * ,
the PDF in (2) resembles a small-variance Gaussian PDF with
mean bp and standard deviation 1/V* [23, p. 60], In fact, for
any beamwidth (angle spread) smaller than 40° (which
correspond to *> 8.2 according to the definition of beamwidth
as BW = 2 Nk in [5]), the plots of Gaussian and von Mises
PDF are indistinguishable (two typical standard deviations for
the Gaussian PDF are 15° [22] and 6° [15], which correspond
to x = 14.6 and *=91.2, respectively). However, recall that
von Mises PDF is able to provide a general and closed-form
solution for the space-time correlation between the complex
envelopes of the array elements, while Gaussian PDF cannot.
Using exactly the same notation as [17], it is straightforward to
show that for the linear uniformly-spaced antenna array at the BS
in [17, Fig. 6] we have:
(5)
/o^V*-2 _A-2 - y 2 +2xycosy+ j2K[xcos{y-a)-y cosa] j,
provided that AOA has a von Mises PDF with the mean direction
ae \-n,7i) and the width control parameter k >0. All of the
parameters in (5) are the same as (3), except for j in (5) which
represents the direction of the motion of the MS with respect to
the horizontal axis counterclockwise, in place of a in (3) (the }
here should not be confused with the SNR symbol } , used in
Section 3). The two sign changes in (5), in comparison with (3),
come from different ways of numbering the array elements: in [4,
Fig. 2], the elements are numbered from left to right, while
elements numbering in [17, Fig. 6] is from right to left.
Now we compare our correlation model with the data published
in [17], where the data are spatial cross-correlations between the
square of the envelopes of a two element array, mounted on a
BS. We do this by considering two models for the AOA PDF at
the BS: the simple model with
Pe(b) = exp{* cos(6 -a)}/2n /0(x) , and the composite model
with pe(6) = £ exp(xcos(6 -a)\/2n /0(*) + (l-£)/2;z .where
0 < £ < 1 indicates the amount of directional reception. The
composite PDF reduces to the von Mises PDF for £ = 1 , and
simplifies to the uniform PDF for £ = 0. Consequently, the
associated spatial correlation functions for a two element array at
a BS can be written as:
012 (0) = I0^Jk2 -4x2(d / A.)2 + j4x K{d / A) cos a )/W ,(6)
012(O) = £ I0^Jic2 -4k 2(d / A)2 + j4x K(d / A)cosa /l0(K)
+(X-S)J0(Z»d/X).
(7)
Figs. 7-8 show Lee’s correlation data, plotted together with
1 012 (0) |2 calculated according to (6) and (7) for both models.
For a given a (known a priori for each data set), the unknown
k for the simple model and the unknown pair (*,£) for the
composite model are estimated by the nonlinear least squares
method (implemented via a systematic numerical search
technique). Based on these figures (and many others not shown
due to space limitations), the von Mises PDF is able to account
for the variations of the correlation versus antenna spacing with
reasonable accuracy (compare our correlation plots with those
drawn in [17] assuming the cosine PDF and [21] using the
truncated uniform PDF, both for the same data sets. Interestingly,
the correlation plots in [17] can also be considered as curves
obtained based on a Gaussian PDF, because for small BW, the
cosine PDF can be approximated by a Gaussian PDF [21]). Note
that in Fig. 7 both models are similar ( £ = 0.98 ), while in Fig. 8
the composite model shows a much better fit ( £ = 0.74 ). In
general the composite model was able to improve the fits
obtained by the simple model, which is not surprising because it
has the additional parameter £ . This is in agreement with the
noise-like signal introduced in [17].
5. CONCLUSION
Space-time processing using antenna arrays over wireless mobile
fading channels offer several advantages in cellular systems, such
as mitigating fading, intersymbol interference, cochannel
interference, etc.. Efficient joint use of both space and time
dimensions demands for spatio-temporal channel models. As a
basic channel model, we need a two dimensional spatio-temporal
correlation function among the random signals sensed by the
array elements, to characterize the second order dependence
structure of the random channel in both space and time. In this
paper we have proposed a flexible spatio-temporal correlation
function for propagation scenarios with non-isotropic scattering
(signal reception from specific directions). The non-uniform
distribution for the angle of arrival, which characterizes the non¬
isotropic scattering, is modeled by von Mises PDF which has
previously shown to be successful in describing the measured
data. The proposed spatio-temporal correlation function is
general enough to include important special cases such as Lee’s
spatio-temporal correlation function and Clarke’s temporal
correlation function, both derived for isotropic scattering.
Moreover, its compact mathematical form facilitates analytical
manipulations of array-based techniques and results in terms of
closed-form expressions for such important fading parameters as
60
spectral moments (successive derivatives of the correlation
function). Based on two case studies (multiuser detection and
diversity reception) and using the new spatio-temporal
correlation function, we have shown that non-isotropic scattering
(typical of many mobile channel scenarios) has a significant
impact on the performance of array processors, and should be
taken into account in the analysis and design of adaptive antenna
arrays for mobile fading channels.
Theoretically, the new correlation function is applicable to both
MS and BS. However, since practical restrictions limit the use of
multiple antennas at a MS, the proposed correlation function
seems to be of much more use in a BS. Therefore, the empirical
justification of the new correlation function is demonstrated by
comparison with published data collected at a BS.
6. ACKNOWLEDGEMENT
This work has been supported in part by the National Science
Foundation, under the Wireless Initiative Program, Grant
#9979443. The authors appreciate the input provided by Dr. T.
A. Brown at Motorola regarding the multiuser multichannel
detector examples.
7. REFERENCES
[1] J. H. Winters, “Smart antennas for wireless systems,” IEEE Pers.
Commun. Mag., vol. 5, no. 1, pp. 23-27, 1998.
[2] R. Kohno, “Spatial and temporal communication theory using
adaptive antenna array,” IEEE Pers. Commun. Mag., vol. 5, no. 1,
pp. 28-35, 1998.
[3] A. J. Paulraj and C. B. Papadias, “Space-time processing
techniques for wireless communications,” IEEE Signal Processing
Mag., vol. 14, no. 6, pp. 49-83, 1997.
[4] W. C. Y. Lee, “Level crossing rates of an equal-gain predetection
diversity combiner,” IEEE Trans. Commun. Technol., vol. 18, pp.
417-426, 1970.
[5] A. Abdi, H. Allen Barger, and M. Kaveh, “A parametric model for
the distribution of the angle of arrival and the associated correlation
function and power spectrum at the mobile station,” submitted to
IEEE Trans. Vehic. Technol., Sep. 1999.
[6] G. L. Stuber, Principles of Mobile Communication. Boston, MA:
Kluwer, 1996
[7] R. B. Ertel, P. Cardieri, K. W. Sowerby, T. S. Rappaport, and J. H.
Reed, “Overview of spatial channel models for antenna array
communication systems,” IEEE Pers. Commun. Mag., vol. 5, no. 1,
pp. 10-22, 1998.
[8] L. C. Godara, “Applications of antenna arrays to mobile
communications. Part I: Performance improvement, feasibility, and
system considerations. Part II: Beam-forming and direction-of-
arrival considerations,” Proc. IEEE, vol. 85, pp. 1031-1060 and pp.
1195-1245, 1997.
[9] W. C. Y. Lee, “A study of the antenna array configuration of an M-
branch diversity combining mobile radio receiver,” IEEE Trans.
Vehic. Technol., vol. 20, pp. 93-104, 1971.
[10] F. Adachi, M. T. Feeney, and J. D. Parsons, “Effects of correlated
fading on level crossing rates and average fade durations with
predetection diversity reception,” IEE Proc. F, Commun., Radar,
Signal Processing, vol. 135, pp. 11-17, 1988.
[11] T. A. Brown and M. Kaveh, “Multiuser detection with antenna
arrays in the presence of multipath fading,” in Proc. IEEE hit.
Conf. Acoust., Speech, Signal Processing, Atlanta, GA, 1996, pp.
2662-2665.
[12] T. A. Brown, “The use of antenna arrays in the detection of code
division multiple access signals,” Ph.D. Thesis, Dept, of Elec. Eng.,
University of Minnesota, Minneapolis, MN, June 1995.
[13] S. T. Kim, J. H. Yoo, and H. K. Park, “A spatially and temporally
correlated fading model for array antenna applications,” IEEE
Trans. Vehic. Technol., vol. 48, pp. 1899-1905, 1999.
[14] A. Klein and W. Mohr, “A statistical wideband mobile radio
channel model including the directions-of-arrival,” in Proc. IEEE
Int. Symp. Spread Spectrum Techniques Applications, Mainz,
Germany, 1996, pp. 102-106.
[15] K. I. Pedersen, P. E. Mogensen, and B. H. Fleury, “A stochastic
model of the temporal and azimuthal dispersion seen at the base
station in outdoor propagation environments,” IEEE Trans. Vehic.
Technol., vol. 49, pp. 437-447, 2000.
[16] P. Pajusco, “Experimental characterization of D.O.A at the base
station in rural and urban area,” in Proc. IEEE Vehic. Technol.
Conf., Ottawa, ONT, Canada, 1998, pp. 993-997.
[17] W. C. Y. Lee, “Effects on correlation between two mobile radio
base-station antennas,” IEEE Trans. Commun., vol. 21, pp. 1214-
1224, 1973.
[18] F. Adachi, M. T. Feeney, A. G. Williamson, and J. D. Parsons,
“Crosscorrelation between the envelopes of 900 MHz signals
received at a mobile radio base station site,” IEE Proc. F,
Commun., Radar, Signal Processing, vol. 133, pp. 506-512, 1986.
[19] J. Salz and J. H. Winters, “Effect of fading correlation on adaptive
arrays in digital mobile radio,” IEEE Trans. Vehic. Technol., vol.
43, pp. 1049-1057, 1994.
[20] T. Trump and B. Ottersten, “Estimation of nominal direction of
arrival and angular spread using an array of sensors,” Signal
Processing, vol. 50, pp. 57-69, 1996.
[21] M. Kalkan and R. H. Clarke, “Prediction of the space-frequency
correlation function for base station diversity reception,” IEEE
Trans. Vehic. Technol., vol. 46, pp. 176-184, 1997.
[22] U. Martin, “Spatio-temporal radio channel characteristics in urban
macrocells,” IEE Proc. Radar, Sonar, Navig., vol. 145, pp. 42-49,
1998.
[23] K. V. Mardia, Statistics of Directional Data. London: Academic,
1972.
Figure 1. Isotropic scattering in an open area (circles are
scatterers).
Figure 2. Non-isotropic scattering in a narrow street.
61
Correlation Coefficient log , 0( Pb(y) ) Asymptotic efficiency rj\(y)
0 5 10 15 20 25 30
SNRy (dB)
Figure 3. Asymptotic efficiency of two multiuser array
detectors.
_» — . — i — . — . — l — - — . . — . — l — . — . — . — . — l — . — . — . — . — I . , I i_J
0 5 10 15 20 25 30
SNR y (dB)
Figure 5. Bit error rate of BPSK with two-branch MRC.
Figure 7. Correlation coefficient versus antennas spacing
Simple: BW = 0.5° , Composite: BW = 0.5° , £ = 0.98
0 5 10 15 20 25 30
SNRy (dB)
Figure 4. Asymptotic efficiency of two multiuser array
detectors.
0 5 10 15 20 25 30
SNRy (dB)
Figure 6. Bit error rate of BPSK with two-branch MRC.
Figure 8. Correlation coefficient versus antennas spacing
Simple: BW =0.4° , Composite: BW =0.2° , £ =0.74
62
A BATCH SUBSPACE ICA ALGORITHM.
Ali MANSOUR and Noboru OHNISHI
Bio-Mimetic Control Research Center (RIKEN),
2271-130, Anagahora, Shimoshidami, Moriyama-ku, Nagoya 463 (JAPAN)
email:mansour@nagoya.riken. go.jp and ohnishi@ohnishi.nuie.nagoya-u.ac.jp
http://www.bmc.riken.go.jp
ABSTRACT
mixtures [16, 17],
For the blind separation of sources (BSS) problem (or the
independent component analysis (ICA)), it has been shown
in many situations, that the adaptive subspace algorithms
are very slow and need an important computation efforts.
In a previous publication, we proposed a modified subspace
algorithm for stationary signals. But that algorithm was
limited to stationary signals and its convergence was not
fast enough.
Here, we propose a batch subspace algorithm. The experi¬
mental study proves that this algorithm is very fast but its
performance are not enough to completely achieve the sep¬
aration of the independent component of the signals. In the
other hand, this algorithm can be used as a pre-processing
algorithm to initialized other adaptive subspace algorithms.
Keywords: blind separation of sources, ICA, subspace meth¬
ods, Lagrange method, Cholesky decomposition.
1. INTRODUCTION
The blind separation of sources (BSS) problem [1] (or the
Independent Component Analysis ”ICA” problem [2]) is a
recent and important problem in signal processing. Accord¬
ing to this problem, one should estimate, using the output
signals of an unknown channel (i.e. the observed signals
or the mixing signals), the unknown input signals of that
channel (i.e. sources). The sources are assumed to be sta¬
tistically independent from each other.
At first the BSS was proposed in a biological context [3],
Actually, one can find this problem in many different situa¬
tions: speech enhancement [4], separation of seismic signals
[5], sources separation method applied to nuclear reactor
monitoring [6], airport surveillance [7], noise removal from
biomedical signals [8], etc.
Since 1985, many researchers have been interested in
BSS [9, 10, 11, 12]. Most of the algorithms deal with a linear
channel model: The instantaneous mixtures (i.e. memory¬
less channel) or the convolutive mixtures (i.e. the chan¬
nel effect can be considered as a linear filter). The crite¬
ria of those algorithms were generally based on high order
statistics [13, 14, 15]. Recently, by using only second or¬
der statistics, some subspace methods have been explored
to separate blindly the sources in the case of convolutive
In previous works, we proposed two subspace approaches
using LMS [18, 17] or a conjugate gradient algorithm [19]
to minimize subspace criteria. Those criteria were been de¬
rived'- from the generalization of the method proposed by
Gesbert et al. [20] for blind identification1 . To improve the
convergence speed of our algorithms, we proposed a modi¬
fied subspace algorithm for stationary signals [21]. But that
algorithm was limited to stationary signals and its conver¬
gence was not fast enough. Here, we propose a new sub¬
space algorithm, which improves the performance of our
previous methods.
2. MODEL, ASSUMPTIONS & CRITERION
Let Y (n) denotes the g x 1 mixing vector obtained from p
unknown and statistically independent sources S(n) and let
the g x p polynomial matrix 7f(z) = ( hij(z )) denotes the
channel effect (see fig. 1). In this paper, we assume that the
filters hij(z) are causal and finite impulse response (FIR)
filters. Let us denote by M the highest degree2 of the filters
hij(z). In this case, Y(n) can be written as:
M
Y(») = £H(i)S(»-i), (1)
t=0
where S(n — i) is the p x 1 source vector at the time ( n — i)
and H(») is the real q x p matrix corresponding to the filter
matrix H(z) at time i.
Let Yn(u) (resp. 5jvf+iv(n)) denotes the g(N + 1) x 1
(resp. (M + N + 1 )p x 1) vector given by:
YN(n)
S’m+.nJ")
/ n»)
\ Y(n — N)
S(n)
S(n — M — N)
JIn the identification problem, the authors generally assume
that they have one source and that the source is an iid signal.
2M is called the degree of the filter matrix 'H(z).
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
63
Sub-space method
By using N > q observations of the mixture vector, we can
formulate the model (1) in another form:
Yw(n) = TW(H )SU+N(n), (2)
where Tjv(H) is the Sylvester matrix corresponding to H(z).
The q(N + 1) x p{M + N + 1) matrix Tat(H) is given by
[22] as:
r H(o)
H(l) ...
H (M)
0 ... 0
0
H(0) ...
H(M- 1)
H(Af) 0
0
0
H(0)
H(l) ... H(M)
Finally, to avoid the spurious solutions (i.e. a singular
matrix M), one must minimize that criterion subject to a
constraint [17]:
Subject to GoR.w(n)G(T = I, (5)
here Rjv(n) = E Yjv(n)Yjv (»)> and the pxq(N + 1) matrix
Go stands for the first bloc line of G = (Gjf ••• G^M+N^)
The minimization using a LMS algorithm of the above cri¬
terion with respect to a constraint was discuss in our previ¬
ous work [17]. In addition, the minimization of a modified
version of the above criterion was done using a conjugate
gradient algorithm [19].
It was proved in [23] that the rank of Sylvester matrix
Tat(H) = p(N + 1) + XX i Mi, where M, is the degree of
the ith column3 of H(z). Now, it is easy to prove that the
Sylvester matrix has a full rank and it is left invertible if
each column of the polynomial matrix H(z) has the same
degree and N > Mp (see [24] for more details). From equa¬
tion (2), one can conclude that the separation of the sources
can be achieved by estimating a ( M + N + l)p x q(N + 1)
left inverse matrix G of the Sylvester matrix. To estimate
G, one can use criterion proposed in [17] obtained from the
generalization of the criterion in [20]:
min G(G) = E ||(I 0)GYjv(n)-(0 I)GY*(n + l)||2, (3)
here E stands for the expectation, I is the identity matrix
and 0 is a zero matrix of appropriate dimensions. It has
been shown in [17] that the above minimization lead us to
a matrix G* such:
3. ALGORITHM
From the previous section, it is clear that the minimiza¬
tion of the criterion (3) should be done subject to a p2
constraints4. Let const denotes the constraint vector (i.e.
const = Vec (GoRjv(rc)Go' — I), here Vec is the operator
that corresponds to a p x q matrix a pq vector). The min¬
imization of the criterion (3) subject to the constraints (5)
can be formulated using the Lagrange method as:
£(G, A) = C(G) - A const (6)
here A is a line vector, stands for the Lagrange parameters.
The minimization of the above equation with respect to A
leads us to the constraint equation (5). Using the derivative
dC( G)/3G given in [17], the equation (5) and (6), one can
write:
Perf = G*TW(H) = diag(M, • • • , M), (4)
where M is any p x p matrix. Using the last equation, it
becomes clear that the separation is reduced to the sepa¬
ration of an instantaneous mixture with a mixing matrix
M. In other words, this algorithm can be decomposed into
two steps: First step, by using only second-order statistics,
we reduce the convolutive mixture problem to an instan-'
taneous mixture (deconvolution step); then in the second
step, we must only separate sources consisting of a simple
instantaneous mixture (typically, most of the instantaneous
mixture algorithms are based on fourth-order statistics).
dC( G,A)
dG
Ip 0 0 \
0 2I(m+jv-i)p 0 1 GRjv(ti)
0 0 Ip /
-(j I(M+N)p ^GR£(n+1)
/ 0 0
\ I (M+N)p 0
GRat(w + 1) —
2T GoRjv(n)
0
where Rj v(n + 1) = E YN(n)Y^(n + 1) and Ii is the l x l
identity matrix. By canceling the above equation and after
some algebraic operations, one can find that the bloc lines
3 The degree of a column is defined as the highest degree of
the filters in this column.
4 Using the symmetrical form of the equation (5), one can
decrease the constraint number to p(p + l)/2.
64
of the optimal G* should satisfy:
GoRjv(n)Go'
= I,
(7)
2GtR/v(n)
= G(i+1)R-N(n + 1) +
G(i_x)Rjv(n 4- 1),
(8)
G(m+w)Rzv
= G(m+at-i)R-w(« + 1),
(9)
here 1 < i < M + N - 1. Let A = R^(n + l)R^1(n) and
B = R x(n + l)R^1(n), we should mention that A and B
exist if and only if (iff) Rjv(n) is full rank5. Finally, using
some algebraic operations, we can prove that the previous
matrix equation system can be solved by a recursion for¬
mula:
= G(M+W-i-2)Di (10)
her 0 < i < M + N — 1 and the Go can be obtained from the
first equation (7), using a simple Cholesky decomposition.
In addition, the matrices Di can also be obtained by:
D(i+i) = B(2I — D,A)-1 (11)
here 0<i<M + N— 1 and Do = B. Even if relation¬
ships (10) and (11) looks complicated, but the time needed
to obtain the matrix G still very comparable6 to the time
needed for the convergence of LMS version [17] or even the
Conjugate Gradient version [21, 19].
4. EXPERIMENTAL RESULTS
The experiments discussed here are conducted using two
sources ( p = 2) with uniform probability density function
(pdf) and four sensors (9 = 4), and the degree of H(z) is
chosen as ( M = 4).
To show the performances of the subspace criterion, the
matrix Perf = G*Tjv(H) is plotted. In the other hand,
we know that the deconvolution is achieved iff the matrix
Perf is a bloc diagonal matrix as shown in equation (4).
Figure 2 shows the performances of the batch subspace al¬
gorithm discussed in this paper. It is clear from that figure 2
that the first step of the algorithm (the deconvolution) was
not satisfactory achieved (Perf is not a bloc diagonal as in
equation (4). This problem was obtained because the crite¬
rion (3) is a flat function around its minima (see figure (2)).
Figure 3 shows us the performance results and the crite¬
rion convergence of the LMS algorithm (first column), and
the performance results and the criterion convergence of
5 It is easy to prove that Rjy(n) is full rank iff one add some
additive independent noise to the observed signals, because one
of the subspace assumption q > p. In the other hand and by us¬
ing the criterion (3), one can prove the existence of some spurious
minima, if the model have some additive noise (the demonstra¬
tion will be omitted here because the limit of the sheet number).
However, the experimental study shows that one still obtain good
results for a 20 dB ratio of signal to noise (RSN). In our simula¬
tion, we added a Gaussian noise with RSN > 20dB.
6 Indeed, using C code program and an ultra 30 creator sun
station, it needs few minutes (less than 5) to obtained the matrix
G. But the convergence of the conjugate gradient needs from
40 to 100 minutes and the LMS algorithm needs few hours to
converge.
the same LMS algorithm but the matrix G is initialized us¬
ing the result of the batch algorithm (second column). We
should mention that the time needed to obtain the minima
by the initialized version was almost half the time needed by
the non initialized version. Figures 3 (c) and (d) show the
criterion convergence (the stop condition was the limit of
the sample number, i.e. 10000). The experimental studies
show that the Conjugate Gradient version of the subspace
algorithm can converge faster and lead us to better per¬
formances if that algorithm has been initialized using the
batch proposed algorithm (these results will be omitted in
this short paper).
The second step of the algorithm consists on the sep¬
aration of a residual instantaneous mixture (correspond¬
ing to M, see equation (4)). This separation can be pro¬
cessed using any source separation algorithm applicable to
instantaneous mixtures. Here, we chose the minimization
of a cross-cumulant criterion using Levenberg-Marquardt
method [25]. Figure (4) shows us the different signals (see
figure" (1)). It is clear that the sources X and the estimated
signals S are independent signals and the vector Z, output
of the subspace criterion, corresponds to an instantaneous
mixture, and the observed vector Y corresponds to a con-
volutive mixture (see [26, 27]).
Finally, the estimation of the second and the high order
statistics was done according to the method described in
[28].
5. CONCLUSION
In this paper, we propose a batch algorithm for source sep¬
aration in convolutive mixtures based on a subspace ap¬
proach. This new algorithm requires, as same as the other
subspace methods, that the number of sensors is larger than
the number of sources. In addition, it allows the separation
of convolutive mixtures of independent sources using mainly
second-order statistics: A simple instantaneous mixture,
the separation of which generally needs high-order statis¬
tics, should be conducted to achieve the separation.
The experimental study shows that the the present algo¬
rithm can be used for initialized an adaptive subspace algo¬
rithm. The initialized algorithms need less time to converge.
These results were discussed in the case of two subspace
algorithms which are based on LMS or on a conjugate gra¬
dient method. Finally, the subspace LMS criterion and the
Conjugate gradient criterion will become more stable and
faster if they are initialized using the present algorithm.
REFERENCES
[1] C. Jutten and J. Herault, “Blind separation of sources,
Part I: An adaptive algorithm based on a neuromimetic
architecture,” Signal Processing, vol. 24, no. 1, pp. 1-
10, 1991.
[2] P. Comon, “Independent component analysis, a new
concept?,” Signal Processing, vol. 36, no. 3, pp. 287-
314, April 1994.
65
[3] B. Ans, J. C. Gilhodes, and J. Herault, “Simulation de
reseaux neuronaux (sirene). II. hypothese de decodage
du message de mouvement porte par les afferences fu-
soriales IA et II par un mecanisme de plasticite synap-
tique,” C. R. Acad; Sci. Paris , vol. serie III, pp. 419-
422, 1983.
[4] L. Nguyen Thi and C. Jutten, “Blind sources separa¬
tion for convolutive mixtures,” Signal Processing, vol.
45, no. 2, pp. 209-229, 1995.
[5] N. Thirion, J. MARS, and J. L. BOELLE, “Separation
of seismic signals: A new concept based on a blind
algorithm,” in Signal Processing VIII, Theories and
Applications, Triest, Italy, September 1996, pp. 85-88,
Elsevier.
[6] G. D’urso and L. Cai, “Sources separation method
applied to reactor monitoring,” in Proc. Workshop
Athos working group, Girona, Spain, June 1995.
[7] E. Chaumette, P. Common, and D. Muller, “Applica¬
tion of ica to airport surveillance,” in HOS 93, South
Lake Tahoe-California, 7-9 June 1993, pp. 210-214.
[8] A. Kardec Barros, A. Mansour, and N. Ohnishi, “Re¬
moving artifacts from ecg signals using independent
components analysis,” Neuro Computing, vol. 22, pp.
173-186, 1999.
[9] J. F. Cardoso and P. Comon, “Tensor-based inde¬
pendent component analysis,” in Signal Processing
V, Theories and Applications, L. Torres, E. Masgrau,
and M. A. Lagunas, Eds., Barcelona, Espain, 1990, pp.
673-676, Elsevier.
[10] S. I. Amari, A. Cichoki, and H. H. Yang, “A new learn¬
ing algorithm for blind signal separation,” in Neural
Information Processing System 8, Eds. D.S. Toureyzky
et. al., 1995, pp. 757-763.
[11] O. Macchi and E. Moreau, “Self-adaptive source sepa¬
ration using correlated signals and cross-cumulants,”
in Proc. Workshop Athos working group, Girona,
Spain, June 1995.
[12] A. Mansour and C. Jutten, “A direct solution for blind
separation of sources,” IEEE Trans, on Signal Process¬
ing, vol. 44, no. 3, pp. 746-748, March 1996.
[13] M. Gaeta and J. L. Lacoume, “Sources separation
without a priori knowledge: the maximum likelihood
solution,” in Signal Processing V, Theories and Ap¬
plications, L. Torres, E. Masgrau, and M. A. Lagunas,
Eds., Barcelona, Espain, 1994, pp. 621-624, Elsevier.
[14] N. Delfosse and P. Loubaton, “Adaptive blind sepa¬
ration of independent sources: A deflation approach,”
Signal Processing, vol. 45, no. 1, pp. 59-83, July 1995.
[15] A. Mansour and C. Jutten, “Fourth order criteria for
blind separation of sources,” IEEE Trans, on Signal
Processing, vol. 43, no. 8, pp. 2022-2025, August 1995.
[16] A. Gorokhov and P. Loubaton, “Subspace based tech¬
niques for second order blind separation of convolutive
mixtures with temporally correlated sources,” IEEE
Trans, on Circuits and Systems, vol. 44, pp. 813-820,
September 1997.
[17] A. Mansour, C. Jutten, and P. Loubaton, “An adap¬
tive subspace algorithm for blind separation of inde¬
pendent sources in convolutive mixture,” IEEE Trans,
on Signal Processing, vol. 48, no. 2, February 2000.
[18] A. Mansour, C. Jutten, and P. Loubaton, “Subspace
method for blind separation of sources and for a convo¬
lutive mixture model,” in Signal Processing VIII, The¬
ories and Applications, Triest, Italy, September 1996,
pp. 2081-2084, Elsevier.
[19] A. Mansour, A. Kardec Barros, and N. Ohnishi, “Sub¬
space adaptive algorithm for blind separation of convo¬
lutive mixtures by conjugate gradient method,” in The
First International Conference and Exhibition Digital
Signal Processing (DSP ’98), Moscow, Russia, June 30-
July 3 1998, pp. I-252-I-260.
[20] D. Gesbert, P. Duhamel, and S. Mayrargue,
“Subspace-based adaptive algorithms for the blind
equalization of multichannel fir filters,” in Signal Pro¬
cessing VII, Theories and Applications, M.J.J. Holt,
C.F.N. Cowan, P.M. Grant, and W.A. Sandham, Eds.,
Edinburgh, Scotland, September 1994, pp. 712-715,
Elsevier.
[21] A. Mansour and N. Ohnishi, “A blind separation
algorithm based on subspace approach,” in IEEE-
EURASIP Workshop on Nonlinear Signal and Image
Processing (NSIP’99), Antalya, Turkey, June 20-23
1999, pp. 268-272.
[22] T. Kailath, Linear systems, Prentice Hall, 1980.
[23] R. Bitmead, S. Kung, B. D. O. Anderson, and
T. Kailath, “Greatest common division via general¬
ized Sylvester and Bezout matrices,” IEEE Trans, on
Automatic Control, vol. 23, no. 6, pp. 1043-1047, De¬
cember 1978.
[24] A. Mansour, C. Jutten, and P. Loubaton, “Robustesse
des hypotheses dans une methode sous-espace pour la
separation de sources,” in Actes du XVIeme colloque
GRETSI, Grenoble, France, septembre 1997, pp. 111-
114.
[25] A. Mansour and N. Ohnishi, “Multichannel blind sep¬
aration of sources algorithm based on cross-cumulant
and the levenberg-marquardt method.,” IEEE Trans,
on Signal Processing, vol. 47, no. 11, pp. 3172-3175,
November 1999.
[26] G. Puntonet, C., A. Mansour, and C. Jutten, “Ge¬
ometrical algorithm for blind separation of sources,”
in Actes du XVeme colloque GRETSI, Juan-Les-Pins,
France, 18-21 September 1995, pp. 273-276.
[27] A. Prieto, C. G. Puntonet, and B. Prieto, “A neural
algorithm for blind separation of sources based on ge¬
ometric prperties.,” Signal Processing, vol. 64, no. 3,
pp. 315-331, 1998.
[28] A. Mansour, A. Kardec Barros, and N. Ohnishi, “Com¬
parison among three estimators for high order statis¬
tics.,” in Fifth International Conference on Neu¬
ral Information Processing (ICONIP’98), Kitakyushu,
Japan, 21-23 October 1998, pp. 899-902.
66
COMPARATIVE STUDY OF TWO-DIMENSIONAL MAXIMUM LIKELIHOOD
AND INTERPOLATED ROOT-MUSIC WITH APPLICATION TO
TELESEISMIC SOURCE LOCALIZATION
Pei-Jung Chung* Alex B. Gershman** Johann F. Bohme*
* Department of Electrical Engineering and Information Science,
Ruhr University, D-44780 Bochum, Germany
p j c , boehme@sth . ruhr-uni-bochum . de
**Department of Electrical and Computer Engineering,
McMaster University, Hamilton, L8S 4K1 Ontario, Canada
gershman@ieee . org
ABSTRACT
We apply the 2-D broadband Maximum Likelihood (ML)
and interpolated root-MUSIC methods to estimate the
azimuth and velocity parameters of teleseismic events
recorded by the GERESS array. A sequential test based
on Likelihood Ratios (LR’s) is developed for signal de¬
tection. Our experimental results show that both meth¬
ods can provide reliable estimates of signal parameters.
However, ML is shown to have better estimation accu¬
racy and robustness than interpolated root-MUSIC at
the expense of a higher computational cost.
1. INTRODUCTION
The ML and MUSIC techniques are two popular meth¬
ods in array processing. Numerous theoretical and
numerical studies have shown that ML outperforms
MUSIC in scenarios with low Signal to Noise Ratios
(SNR’s), small number of samples, coherent signals, as
well as closely spaced sources [1], However, an enor¬
mously high computational cost needed for ML makes
this statistically optimal approach in many cases less
attractive than MUSIC. Therefore, a crucial issue is
how to choose a proper algorithm for a particular ap¬
plication to achieve sufficiently high performance and
acceptable computational complexity.
In the present work, we apply broadband ML [2]
and 2-D interpolated root-MUSIC [3] to localization of
several teleseismic events using the GERESS array real
data. A sequential test procedure based on LR’s is used
to detect signals within the observation interval. Due
This work was supported by the German Science Foundation
and by the Natural Sciences and Engineering Research Council
(NSERC) of Canada.
to complicated propagation effects, there may be more
than one signal phase arriving at the same time from
the same direction. However, different signal phases
should differ in their velocities. It is worth noting that
the ML method can be directly applied to the broad¬
band Direction Of Arrival (DOA) estimation problem.
On the other hand, root-MUSIC should be adapted to
the broadband setting, for example, by means of the
so-called array interpolation technique [4] allowing to
combine the information from different frequencies in a
coherent way. In [3] and [6], a high-SNR regional man¬
made seismic event was analyzed by means of the ML
and interpolated root-MUSIC techniques. Both meth¬
ods provided excellent results in this case. Below, we
address a more difficult teleseismic event case, which is
characterized by much lower SNR’s and more compli¬
cated propagation phenomena relative to the regional
event case. In the teleseismic case, signal detection
becomes a very important issue, since it is almost im¬
possible to identify weak signals in seismograms (for
example, see Fig. 1 displaying a typical seismogram of
teleseismic event).
The experimental results reported in the present
paper demonstrate that in the teleseismic case, both
ML and interpolated root-MUSIC may be successfully
applied to source localization. ML is shown to have
better performance arid robustness than interpolated
root-MUSIC. However, the latter approach enjoys sim¬
pler implementation.
2. DATA MODEL
Let an array of N sensors receive M broadband signals
from far-field sources. The 2-D array can be assumed
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
68
since the length of the vertical aperture of GERESS
is much smaller than that of the horizontal one and is
negligible compared to the seismic signal wavelength.
The array output x(t) sampled at discrete times t =
0, . . . , T— 1 is short-time Fourier- transformed using the
so-called Thomson’s multitaper technique [7]:
1 T_1
Xt(u) = -= £ wt(t)x(t)e~jut , (l = 0,. . .,L - 1) ,
VT 2=0
(1)
where {wi(t)}t=o,...,T-i is the Zth orthonormal window
function.
For sufficiently large T, the Fourier-transformed da¬
ta can be approximately expressed as
Xi(u) = H(w)£i(a/)+f?,(w), (2)
H(w) = [di(w),...,djif(w)], (3)
where Xt(u) e CNxl, H(w) € CNxM , S,(w) € CMxl,
and Ui{<jj) € C7Vxl are the observation vector, the
steering matrix, the vector of signal waveforms, and
the vector of sensor noise, respectively. The steering
vector drn(ui) associated with the mth signal is given
by
dmM = [e-^»ri , • • • , e-^r1T , (4)
where rn = (xn, yn) is the coordinate of the nth sensor.
The slowness vector £m is related to the source azimuth
am and the respective velocity Vm as follows:
§m = T7~[ cos am, sin am }T . (5)
V m
The signal waveforms Si(u}j),(l = 0, ...,L — 1, j =
1, . . . , J) are assumed to be deterministic and unknown.
From the asymptotic theory of the Fourier transform,
it is well-known that Xi(u>j), ( l = 0, — 1; j =
1 ,...,</) are independent complex Gaussian distributed
with the mean H(wj)S) (utj) and the covariance matrix
v(cjj) I where v{uj) is the sensor noise power at the fre¬
quency uij and I is the identity matrix [2], The problem
is to detect the signals and estimate their parameters
Kn}, m = 1, . . . , M.
3. WIDEBAND MAXIMUM LIKELIHOOD
Based on the independence and asymptotic gaussianity
in the frequency domain, the approximate wideband
log-likelihood function can be expressed as [2]
j
m = £loStr [{I - P(wi,0)}B*(«j)] , (6)
j= 1
where
= (7)
denotes the unknown slowness vector, P(ujj,d) is the
proiection matrix onto the column-space of the matrix
HK),
1 i_1
RzK) = j £ ZMZfiui) (8)
^ 1=0
is the sample spectral density matrix, and (• )H denotes
the Hermitian transpose. The ML estimate ^ML is ob¬
tained by minimizing (6) over r).
4. WIDEBAND INTERPOLATED
ROOT-MUSIC
In this section, we describe the 2-D extension [3] of the
wideband interpolated root-MUSIC algorithm [5] that
will be applied for joint estimation of the azimuth and
velocity parameters of seismic sources.
Let the 2-D array be divided into two subarrays of
Ns sensors each, denoted as subarrays (a) and (b), re¬
spectively. Since the outline of the algorithm is similar
for each subarray, in the sequel we consider only the
subarray (a). Its observation vector can be modeled as
Xl,a M = Ha (Lu)St (W) + Ulta M • (9)
This subarray will be used for interpolation of the set of
J virtual ULA’s with the interelement spacings dcucluj
(j = 1, . . . , J), where u>c is the central frequency, and
dc is the interelement spacing of the virtual ULA at
u)c. To obtain the same array manifold for each fre¬
quency, the interpolation matrices B j can be designed
in a regular way [4]. The coherently averaged covari¬
ance matrix can be obtained as
1 ^
Ra = 7£BfRaBJ, (10)
J i=i
where
1 L~l
= 7£w<m- (u)
^ 1=0
The noise covariance matrix after the coherent process¬
ing can be computed as
1 J
Q=7£^K)BfB,, (12)
J i= 1
where 0(cjj) is some estimate of sensor noise at the
frequency ojj. The matrix
Rq = Q-1/2RaQ-1/2 (13)
69
is the spectral density matrix after prewhitening. The
eigendecomposition of this matrix yields
Ra = UsAsUf + UwAjvU" , (14)
where the matrices Us and U /v contain the signal- and
noise-subspace eigenvectors, respectively. In turn, the
diagonal matrices A g and Ajv contain the signal- and
noise-subspace eigenvalues, respectively.
The root-MUSIC polynomial has the form
Da(z) = ^r(l/*)Q-1/2UJVUjjQ-1/2d(*) , (15)
where d(z) = [1 ,z~1,...,zN~1]T. Let {za,i, ■ ■ ■ ,2a, m}
denote the M signal roots of (15), which are sorted
based on their proximity to the unit circle. Similarly,
we can find M signal roots {z^i , . . . , zbtM} for subar¬
ray (b). Combining the results from these two virtual
subarrays, we can find M2 candidate estimates of £ by
solving the system
Axatx + A yat,y = arg ,
UJc
Axb£x + A yb£y = arg — (16)
u>c
for i,k = 1 where Axa, A ya, Axb, and Ayb
define the interelement spacings of the virtual arrays
(a) and (b), respectively. The final estimate of £ is
then obtained by selecting the M pairs (£x,£y) which
correspond to the maximal values of 2-D MUSIC spec¬
tral function. The estimates of azimuth and velocity
^music can be obtained from these M pairs using (5).
5. LIKELIHOOD RATIO TEST
In this section, we develop a sequential LR-based test
for detecting the number of signals. Let m denote the
hypothetical number of signals. In each step, the de¬
tection problem can be formulated as testing the hy¬
pothesis Km against the alternative Am:
Km m signals are present ,
Am more than m signals are present .
Starting from m = 0, this test should be performed
stepwise and then stopped once the hypothesis Km be¬
comes accepted. Applying LR principle, we obtain the
following test statistic in the mth step [2]:
1 J
tm = + —Fm{Uj)) ^ ta, (17)
J 3=1 "2
where
n2tr.
|Pm+l (w) 2?ML ) Pm(kM
?ml)}R-a(w)]
ni
«[
— Pm+l(w,l)ML )j
f R*(“0]
(18)
n\ = L(2m + 4), n2 — L(2N — 2) , (19)
and € K2m is the ML estimate of the signal pa¬
rameter vector. If tm exceeds the test threshold ta , the
hypothesis will be rejected. The quantity calculated by
Fm (u>) can be interpreted as an estimate of the increase
in SNR when adding the (m+l)th signal. To be de¬
tected, the power of (m-t-l)th signal must be sufficiently
high compared to the noise power. Under the hypoth¬
esis Km, the value Frn{u)b) is approximately centrally
F-distributed with the degrees of freedom ri\ and n2 ■
The threshold ta is determined by the Cornish-Fisher
expansion with a good accuracy [8]-[9] . Note that the
LR test can be easily implemented if the corresponding
ML estimates are available.
6. REAL DATA PROCESSING
In this section, we apply the developed techniques to
real data processing. These data were recorded by the
GERESS array located in the Bavarian Forest, Ger¬
many. Details about this array can be found in [10].
Two teleseismic events (earthquakes) which occurred
on February 13, 1993 in the Eastern Mediterranean
and on February 26, 1996 in the Middle East, respec¬
tively, were selected for our analysis. The latter event
is contaminated by a smaller pre-shock, located about
37 km from the main event. More information about
the selected events is collected in Table 1.
Array output was sampled with fs = 40 Hz. For
each data set, we used a sliding window with the length
of 3.2 s and the shift of 0.5 s. The total of seven fre¬
quency bins between 0.9 and 3.1 Hz have been used.
Two independent virtual ULA sets have been employed
for the interpolated root-MUSIC algorithm with the
central frequency fc — 2.2 Hz. The spectral density
matrix Rx (wj ) has been estimated using L = 3 Thom¬
son’s windows which roughly correspond to 3 inde¬
pendent snapshots. The sequential detection proce¬
dure kept the test level a = 0.033 constant in each
step. Theoretical slowness values have been derived
from AK135 earth model [11].
The results obtained from the weak event analysis
are shown in Figs. 1 and 2. Typical seismometer out¬
puts are plotted in the first subplot of these figures.
The second subplot shows the output of the LR-based
detector which was used in conjunction with both tech¬
niques to provide their adequate comparison. Appar¬
ently, the P-phases are detected with a good time reso¬
lution while the S-phases (traveling with lower velocity)
are not detected at all. Some false alarms can be ob¬
served. The ML estimates for the back-azimuth and
velocity are well concentrated around their theoretical
values. The estimates obtained from 2-D interpolated
70
Table 1: Event List from NEIC.
time
h.m.s
lat
deg N
long
deg E
dist
deg
az
deg
mag
mb
locat
03:42:53
34.43
24.81
16.60
146.1
3.7
Crete
07:17:08
28.87
34.48
25.54
133.8
4.0
Gulf
Aqaba
07:17:28
28.73
34.82
25.81
133.4
5.0
Gulf
Aqaba
root-MUSIC show higher variances. Interestingly, both
methods provide better results for the azimuth than the
velocity. Such a relatively poor performance of velocity
estimates may be explained by quite a limited aperture
length of GERESS.
In Figs. 3 and 4, another event is analyzed. It con¬
tains two seismic sources of moderate scales originating
from the same location but at slightly different times
(see Table 1). In this data set, a stronger event follows
shortly after a weak event. In particular, such a situ¬
ation is of great importance when monitoring nuclear
explosions. Due to high SNR’s, the signals can be cor¬
rectly detected during the whole analysis interval. One
signal is detected at about 30th second when waves
from the first earthquake arrive the array. At 57th sec¬
ond, the LR test shows two signals, corresponding to
the case when the superimposing waves from the first
and second seismic sources both arrive the array. Dur¬
ing the period from 300th to 360th second (the so-called
S-phases), similar detection results can be observed as
well. The signals detected from the beginning of the
analysis up to 16th second could be interpreted as false
alarms or another weak event. The estimates of the az¬
imuth and velocity shown in subplots 3 and 4 illustrate
that the ML technique has better robustness and lower
variance than the 2-D interpolated root-MUSIC tech¬
nique. Note that the performance of the latter method
is not much better in the strong event case than in the
weak event one, since the interpolation errors become
more critical at high SNR’s. Similarly to the previous
example, both methods show better azimuth estima¬
tion performance relative to that of velocity estimation.
7. CONCLUSIONS
We compared the performances of wideband ML and
interpolated root-MUSIC algorithms by processing weak
and strong teleseismic events recorded by the GERESS
array. Our results show that ML has better estimation
accuracy and robustness relative to root-MUSIC. An¬
other advantage of ML is that the application of the
LR test for detecting the number of signals is straight¬
forward. However, the enormous computational cost
GERESS data : 13.02.1993 03:43 34.4N 24.8E mb = 3.7 Crete
time: 03:46:00 - 03:52:00 (sec)
Figure 1: Wideband ML, first event. ” — theoretical
values for back-azimuth, ”x”: theoretical values for
velocity.
GERESS data : 13.02.1993 03:43 34.4N 24.8E mb = 3.7 Crete
time: 03:46:00 - 03:52:00 [sec]
Figure 2: Wideband interpolated root-MUSIC, first
event. ” — theoretical values for back-azimuth,
” x”: theoretical values for velocity.
71
GERESS data : 26.02.1996 07:17 28.7N 34.8E mb = 5.0 Guff of Aqaba
30 60 90 120 150 180 210 240 270 300 330 360
Bme: 0752:00 - 07:28:00 [sec]
Figure 3: Wideband ML, second event. ” — theo¬
retical values for back-azimuth, ” x”: theoretical values
for velocity of the main event, theoretical values
for velocity of the pre-shock.
GERESS data : 26.02.1996 07:17 28.7N 34.8E mb = 5.0 Gull ol Aqaba,
4000 F
—4000 c
30 60 90 120 150 180 210 240 270 300 330 360
r
I
■So
.± , ,1 . L-
i
30 60 90 120 150 180 210 240 270 300 330 360
30 60 90 120 1 50 1 80 210 240 270 300 330 360
fme: 0722:00 - 07:28:00 [sec]
Figure 4: Wideband interpolated root-MUSIC, sec¬
ond event. ” — theoretical values for back-azimuth,
”x”: theoretical values for velocity of the main event,
” *” : theoretical values for velocity of the pre-shock.
associated with the ML technique may be critical in
practical applications.
REFERENCES
[1] J.F. Bohme, “Advances in spectrum analysis and
array processing,” in Array Processing , Haykin, S.,
Editor, Prentice Hall, pp. 1-63, 1991.
[2] J.F. Bohme, “Statistical array signal processing of
measured sonar and seismic data,” in Proc. SPIE
2563: Advanced Signal Processing Algorithms, San
Diego, CA, July 1995, pp. 2-20.
[3] D.V. Sidorovich and A.B. Gershman, “Two-
dimensional wideband interpolated root-MUSIC
applied to measured seismic data,” IEEE Trans.
Signal Processing, vol. 46, pp. 2263-2267, Aug.
1998.
[4] B. Friedlander, “The root-MUSIC algorithm for di¬
rection finding with interpolated arrays,” Signal
Processing, vol. 30, pp. 15-29, Jan. 1993.
[5] B. Friedlander and A.J. Weiss, “Direction finding
for wideband signals using an interpolated array,”
IEEE Trans. Signal Processing, vol. 41, pp. 1618-
1634, Apr. 1993.
[6] D.V. Sidorovich, C.F. Mecklenbrauker, and J.F.
Bohme, “Sequential test and parameter estima¬
tion for array processing of seismic data,” in Proc.
8th IEEE Workshop Stat. Signal Array Processing,
Corfu, Greece, June 1996, pp. 256-259.
[7] D.J. Thomson, “Spectrum estimation and har¬
monic analysis,” Proc. IEEE, vol. 70, pp. 1055-
1096, Sep. 1982.
[8] P. Hall, The Bootstrap and Edgeworth Expansion,
Springer- Verlag, NY, 1992.
[9] C.F. Mecklenbrauker, P. Gerstoft, J.F. Bohme,
and P-J. Chung, “Hypothesis testing for geoacous¬
tic environmental models using likelihood ratio,”
JASA, vol. 105, pp. 1738-1748, March 1999.
[10] H.P. Harjes, “Design and siting of a new regional
array in Central Europe,” Bull. Seism. Soc. Am.,
vol. 80B, pp. 1801-1817, June 1990.
[11] B. Kennett, E.R. Engdahl, and R. Buland, “Con¬
straints on seismic velocities in the Earth from trav-
eltimes,” Geophys. J. Int. , vol. 122, pp. 108-124,
1995.
72
BOUNDS ON UNCALIBRATED ARRAY SIGNAL PROCESSING
Brian M. Sadler
Richard J. Kozick
Army Research Laboratory
Adelphi, MD 20783
bsadler@arl.mil
Bucknell University
Lewisburg, PA 17837
kozick@bucknell.edu
ABSTRACT
Deterministic constrained Cramer-Rao bounds (CRBs)
are developed for general linear forms in additive white
Gaussian noise. The linear form describes a variety of ar¬
ray processing cases, including narrow band sources with
a calibrated array, the uncalibrated array cases of instan¬
taneous linear mixing and convolutive mixing, and space-
time coding scenarios with multiple transmit and receive
antennas. We employ the constrained CRB formulation of
Stoica and Ng, allowing the incorporation of side informa¬
tion into the bounds. This provides a framework for a large
variety of scenarios, including semi-blind, constant modu¬
lus, known moments or cumulants, and others. The CRBs
establish bounds on blind estimation of sources using an
uncalibrated array, and facilitates comparison of calibrated
and uncalibrated arrays when side information is exploited.
1. INTRODUCTION: MODEL
Consider the additive noise linear model
xt = Hst + vt, t =!,••• ,N, (1)
We develop CRBs for these cases using the constrained CRB
methodology of Gorman/Hero and Stoica/Ng [3]. The con¬
straints arise due to side information such as constant mod¬
ulus sources, constraints on the structure and elements of
H, and semi-blind sources (some known signal values). Ex¬
amples are given comparing calibrated and uncalibrated ar¬
ray CRBs. A space-time coding example is also presented.
2. FIM & CONSTRAINED CRBS
Forming the IN x 1 supervector X = [xf , • ■ ■ , X^]T, then
X ~ CN(|JX, = ff2lwxw), where
\lx=E[x] = [tf,--- ,|j£]T, Mt = #St. (2)
Thus we have a multivariate complex normal process with
deterministic time- varying mean H St. We define the data
matrix and the columns of H as
S= [Si , • - • ,Sjv]fcxJV, H = [hi, -- ,hfc]. (3)
We write the unknown deterministic parameters in a real
vector of length 2 Ik + 2 kN, given by
where Xt is l x 1 and If is l x k. The elements of the k x 1
signal vector will be denoted by St = [si(t),... ,a*:(f)]T.
We use the notation superscript T, *, H for transpose, con¬
jugate, and conjugate transpose, respectively, with complex
numbers denoted c = c + jc. The noise vt is assumed com¬
plex white Gaussian, with variance «r2. The model (1) un¬
derlies many array processing and single-sensor scenarios.
In the narrow band calibrated array case (l sensors and
k sources), H — A(0) ■ a is of known parametric form with
respect to the source bearings. Here A(6) is the array man¬
ifold matrix, and a = diag(<*i , • ■ • , a*,) contains complex
constants a< that model the channel attenuation for the ith
source. Constrained bounds are developed for this case in
[1, 2].
In this paper we are interested in the general case when
H is unknown. This arises in the uncalibrated array cases
of instantaneous linear mixing and convolutive mixing, and
the space-time transmit diversity case with arrays for both
transmission and reception. An uncalibrated array may
have unknown sensor placement, phase mis-matching, and
so on. In such cases blind methods may be used to sepa¬
rate and estimate source waveforms without estimating the
source bearings. Performance bounds are not straightfor¬
ward due to the lack of regularity in the Fisher information
matrix (FIM) associated with (1) in the uncalibrated case.
e = [0H,0s]T,
0H=[hW,...,hr,f£]T,
0s = [5T,sT,--.,5£,s£]t. (4)
Note that er2 decouples from the other parameters, and so
it is omitted.
The FIM J for 0 is obtained from
dQi dQj
Partitioning J we write,
lJ(e)]« = J>R*
(5)
Jh Jhs
JsH Js ’
(6)
with elements described next. Define the 2k x 2k matrix
Jo =
HhH jHH H ‘
-jHHH HhH '
(7)
then Js is given by the block-diagonal 2 kN x 2k N matrix
2
Js = -TRe{diag(Jo, • • • , Jo)}, (8)
(7
where Jo repeats N times.
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
73
Jff may be written as follows,
P =
*
kxk
[■P]mn j[P\mn
. -m mn [^] mn
® JlXl
Jh =
r b„
•But
Bkk
2lkx2lk
(9)
(10)
(11)
where ® denotes Kronecker product.
Next we consider the cross-terms in the FIM, Jus and
Jsh- It can be shown that
L
mn
[S\*mn j[S}*mn
[S]m»
(12)
Jhs =
r £n
L-\n
LkN
— Jsh ■ (13)
2lkx2kN
As noted, the FIM J is generally not invertible because
the model parameters are not identifiable, and so no un¬
biased estimator for 0 exists. However, it is possible to
achieve identifiability, and then regularity of the FIM, by
establishing constraints on 0. We establish K equality con¬
straints on elements of 0, where K < dim(0). The con¬
straints have the form /i(0) = 0 for i = 1, . . . , K. Define a
K x 1 constraint vector /(0), and a corresponding K X M
gradient matrix
F(0) = ^e1 (14)
with elements [F(0)]i,m = dfl(Q)/d[Q\rn. The gradient
matrix F(0) is assumed to have full row rank K for any
0 satisfying the constraints /i(0),... ,/k(0). Then, the
constrained CRB is obtained via (Thrm. 1 of [3])
E [(0 - 0)T(0 - 0)] > U{UTJU)-'UT. (15)
J is the unconstrained FIM from (5), and U is an ortho-
normal basis for the null space of F(0), i.e., FU = 0 and
UTU = I. Note that U is a function of the constraints only.
Examples of source constraints of interest include con¬
stant modulus (CM) sources, known source cumulant or
kurtosis, and semi-blind sources (some known source sam¬
ples). Constraints may also be placed on H, such as lim¬
iting the norm of H. Together, sufficient constraints may
be found to insure information regularity. These provide
CRBs on symbol estimation in blind source separation sce¬
narios that exploit source features such as CM. We may also
compare bounds on source estimation for both calibrated
and uncalibrated arrays using the results of [1, 2], where
we have established CRBs on bearing, symbol, and channel
estimation for calibrated arrays with side information.
3. EXAMPLES IN ARRAY PROCESSING
We use the constrained CRB formulation to gain insight
into the following questions.
1. Which provides more accurate signal copy: an uncal¬
ibrated array (unknown H matrix in (1)) with CM
signals, or a calibrated array (H = A(0) • a) with
unconstrained signals?
2. Algorithms for blind beamforming with uncalibrated
arrays often exploit independence between the signals
and non-Gaussianity as characterized by the kurtosis
[4, 5, 6]. What is the relative value of these con¬
straints when compared with the CM constraint for
CM signals? Do the CRBs based on kurtosis con¬
straints imply any difference in separability of CM
and QAM signals?
We generate observations Xi , . . . , Xn in (1) using a com¬
plex narrowband array model in which H = 4(0) • a, where
A(6) = [a(#i), • • ■ , a (0(c)] is the array response matrix, 0 =
[0i , . . . , 0k]T are the source angles of arrival (AOAs), a(0,)
is the array manifold, and a = diag{r*i , • ■ • , q^} is a di¬
agonal complex channel gain matrix. We consider a uni¬
form linear array (ULA) with omnidirectional sensors and
half-wavelength spacing, so the array manifold elements are
[a(0)]m = exp[j7r(m — l)sin0], m = 1, ... ,1.
Consider a particular ULA with 1 = 5 sensors and k — 2
sources with AOAs 0i = 0° and 02 varying from 1° to
30°, where the AOAs are measured with respect to the
array broadside. The noise variance is a2 = 1, and the
number of time samples is AT = 100. The complex ampli¬
tudes Qi and at are generated with phase shifts Zc*i =
f and Z«2 = — f rad. The amplitudes |qi | and [0:2 1
are chosen to achieve a desired sample SNR, defined as
SNR, = |ai|2C2i (i)/er2 where the sample variance of sig¬
nal i is C2i(i) = (1/N)5[]fli lsi(*)|2- SNRi is fixed at
10 dB, while SNR2 is evaluated at 5, 10, and 15 dB. One
beamwidth for the array iB 23.6° at broadside.
3.1. Calibrated vs. uncalibrated arrays
The constrained CRB for a calibrated array in which H has
the structure .4(0) • a is presented elsewhere [2]. Here we
compare the calibrated array CRBs with the uncalibrated
array CRBs outlined in the previous section (5), (15). The
signal vectors Si,-" ,Sjv are 8-PSK waveforms with unit
modulus I Si (t) | = 1 and phase rotation such that Si =
[1, ••• , 1] - For the case of unconstrained mixing matrix
H, it is known [7] that the CM signal constraint and the
specified phase rotation are sufficient to uniquely identify
H and the signal phases Zsj(t). For the case of a calibrated
ULA, it is well-known that the AOAs 0 and signals s,(t) are
identifiable with no signal constraints (“blind” signals).
Figure 1(a) contains the mean CRB on the signal phase
parameters Zsj(2), . . . , Zsi(N) for sources i — 1,2 and var¬
ious constraints on the structure of H and the signals St.
Note that as the source spacing decreases to less than one
beamwidth, the constraints of CM signals with an uncal¬
ibrated array (unknown H) potentially provide mote ac¬
curacy in signal phase than a calibrated array with blind
signals. Further, the o and x symbols are coincident on
the plots. So for CM signals, a calibrated array provides
negligible improvement in signal phase accuracy compared
with an uncalibrated array that places no constraints on
H. This example adds further testament to the well-known
power of the CM signal constraint for signal separation.
74
3.2. Uncalibrated array and moment constraints
The following constraints on the signal moments are com¬
mon in blind beamforming algorithms, e.g., [4]-[6]:
jjS SH = known matrix, typically I (16)
C2o(i) = jr Si(tf k110^) i = 1, • ■ • , k (17)
v t= i
1 N
fhAiii) = ^2 lSi(0|4 is known, i = 1, . . . , k. (18)
These are sample moments and not expectations. Note (16)
expresses that the signals are uncorrelated, and the diago¬
nal elements of (16) constrain the signal sample variances
(72i (i) = 1- Then (16)-(18) imply that the signal sample
kurtoses C42(i) = m42 — |C2o(i)|2 — 2C2i(i)2 are known.
We will refer to (16)-(18) as “moment constraints,” and we
further assume that the first sample of each source signal Si
is known in order to obtain an invertible constrained FIM.
We consider two types of signals: both source signals are
8-PSK (CM), and both source signals are 64 QAM.
Figures l(b)-(d) contain constrained CRBs for this sce¬
nario. For the CM signals, we have also included on the
plots the CRBs based on the CM signal constraints |si(t)| =
1, t = 2, . . . , N, i = 1, . . . , k. The CM signal constraints
are exploited by some blind beamforming algorithms, e.g.,
ACMA [7].
Figure 1(b) contains mean CRBs for the elements of the
H matrix. In the bottom panel in which source 2 is strong
(SNR2 = 15 dB), the moment constraints and the CM con¬
straints yield about the same CRBs for most values of 62.
In difficult scenarios where the sources become very closely
spaced (less than 10°), the CM signal constraint becomes
more informative than the moment constraints. Similar be¬
havior is exhibited in the top panel of Figure 1(b): source 2
is weaker (SNR2 = 5 dB), so the CM constraints are more
informative than the moment constraints over a larger range
of AOA spacings. Note also that if only moment constraints
are used, QAM signals provide lower CRBs on H than CM
signals for this case.
Mean CRBs for estimation of the signals S2, . . . , Sat are
shown in Figures 1(c) and (d). Source 2 is weaker in Fig¬
ure 1(c) than in Figure 1(d), and we have also included the
CRBs for signal estimation when the H matrix is known
perfectly (marked with boxes) but no signal constraints are
applied (the blind, calibrated case). In difficult situations
of low SNR and closely-spaced sources, exploiting the CM
property provides the potential for better performance com¬
pared with the moment constraints. Note that the CRBs for
signal moment constraints and unconstrained H are approx¬
imately equal to the CRBs for known mixing matrix H and
unconstrained signals, which is similar to our observations
about calibrated vs. uncalibrated arrays in Section 3.1.
4. SPACE-TIME CODING
Space-time coding employs multiple antennas on transmit
and receive [8]. In the flat fading case the model of (1) arises
with k transmit and l receive antennas, where St is the k x 1
code vector transmitted by the k antennas at time t, and
[H] ij is the complex fading channel gain from the jth trans¬
mit antenna to the ith receive antenna. The independent
Rayleigh fading model corresponds to the [H\ij being in¬
dependent, complex, Gaussian random variables with zero
mean and unit variance. Suppose that the signal constel¬
lation is assumed to have average energy equal to one, and
let Ea denote the total energy transmitted from all fc an¬
tennas per symbol. Then we use \jEsjk ■ H in the model
(I) , yielding an average SNR per receive antenna equal to
Ea/t r2 for independent, flat, Rayleigh fading channels.
The model (1) assumes that the fading coefficients [ffjy
are constant over the block of N symbol times. The con¬
strained CRBs developed in this paper assume that H ■ St
in (1) is deterministic, so constrained CRBs may be com¬
puted for a particular realization of the fading matrix H.
In the example presented next, we average the CRBs from
multiple independent realizations of H to investigate the
diversity gain that results from various constraints.
4.1. Constraints
As an example, consider the two-transmit antenna space-
time coding scheme in [9]. The code in [9] for k = 2 trans¬
mitters can be expressed via the signal constraints
St+1=Ps*, t = 1,3,... ,N — 1(N even) (19)
(20)
where P =
0 -1
1 0
so a total of two complex symbols are encoded in St and
St+i . Sampling at the symbol rate is assumed, and this en¬
coding leads to a simple linear receiver structure for maxi¬
mum likelihood (ML) symbol detection. The ML detector
requires knowledge of the channel matrix H, and training
samples are suggested in [9] for estimation of H. We investi¬
gate bounds on estimation of the signals St in the space-time
coding context with T < N training symbols (semi- blind),
the code (19), and other constraints including CM signals
and known H matrix.
Suppose that the first T symbols Si , . . . , St transmit¬
ted from both antennas are known, and assume that T < N
with T and N even. Then the gradient matrix (14) corre¬
sponding to the T training symbols (semi-blind) and the
space-time code (19) for samples T + 1, . . . , N has the form
F, =
0fc(T + AT)x2!fc
l2fcTx2fcT
Fo
Fo
(21)
where Fo repeats (N — T)/2 times and equals
Fo =
Ux4
P 02x2
02x2 — P
(22)
The constraints characterized by (21) will be denoted ‘SEMI¬
BLIND & S-T CODE’ in the example below. We also con¬
sider other combinations of constraints. ‘SEMI-BLIND’ in¬
cludes training symbols Si , . . . , St that could be used to
jointly estimate H and the unknown signals St+i, ■ • ■ , Sj v,
but the space-time code is not exploited. We can apply the
75
RMS ERROR RMS ERROR RMS ERROR (DEGREES) HMS ERROR (DEGREES)
Figure 1: Source 1 bearing is fixed at = 0°, source 2 bearing is varied on [l°,30°j. (a) Uncalibrated vs. calibrated arrays:
CRB on signal phase estimation for 8-PSK signals, (b) Mean CRB on elements of H matrix for 8-PSK (CM) signals and
64-QAM signals for various constraints, (c)-(d): Mean CRB for signals with (c) SNRi = 10 dB and SNR2 = 5 dB and (d)
SNR2 = 15 dB.
76
constraint that the N — T unknown signals are CM, i.e.,
|si(t)| = 1, i = 1 ,k, t = T + 1, . . . , N. We can also
apply the constraint of known H matrix, which provides a
basis for evaluating the effectiveness of the T training sym¬
bols for estimation of H.
4.2. Example
Consider an example with k = 2 transmit antennas, l = 2
receive antennas, independent Rayleigh fading, and N = 50
time samples with T = 2 training symbols. The fading is
assumed to be constant over the block of N symbol times.
The SNR per receive antenna is varied over the range 0 to
20 dB, and the constrained CRBs are averaged over 500
independent fading matrices H for each SNR value. The
signals are 8-PSK, and the transmitted signals satisfy the
space-time code constraint (19). For each realization of H,
we compute CRBs on the signal phases ZSt+i,... , Zs n
subject to various constraints, and these CRBs are averaged
to obtain mean CRBs for the realization.
Figure 2 contains constrained CRBs on signal phase
estimation for various constraints. The space-time code
structure (19) is present in the transmitted signals, but
it is only enforced in the constraints labeled ‘S-T CODE’.
When the space-time code is not applied, the CRB cor¬
responds to independent estimation of the transmitted se¬
quences si (T + 1), . . . , si (N) and s2(T + 1), . . . , a2(N), so
diversity gain is impossible. We make the following obser¬
vations from Figure 2.
• Comparing ‘KNOWN H’ with ‘KNOWN H & S-T
CODE’ shows a potential diversity gain of approx¬
imately 10 dB in SNR provided by the space-time
code when H is known exactly.
• Comparing ‘SEMI-BLIND & S-T CODE’ with
‘KNOWN H & S-T CODE’ shows that T = 2 training
symbols for estimation of H costs approximately 3 dB
in SNR compared with exact knowledge of H.
• The ‘SEMI-BLIND & CM & S-T CODE’ curve shows
that exploiting CM in addition to the training and
space-time code potentially yields about 1.5 dB gain
in SNR.
• For the cases in which the space-time code constraint
is not exploited, the ‘SEMI-BLIND & CM’ constraint
provides approximately 2 dB gain compared with
‘KNOWN H’, which does not exploit the CM prop¬
erty.
Note that the constrained CRBs on ZSt pertain to estima¬
tion of the signals, while the primary quantity of interest
in digital communication is probability of detection error.
Smaller CRBs suggest the potential for reduced probability
of detection error in practical receivers.
5. REFERENCES
[1] B.M. Sadler, R.J. Kozick, T. Moore, “Bounds on con¬
stant modulus and semi-blind array processing,” Proc.
CISS'2000, March 2000.
MEAN CRB ON SIGNAL PHASE. T . 2 TRAINING SYMBOLS
Figure 2: CRBs for signal phase estimation in space-time
coding scenario with k = 2 transmitters, 1 = 2 receivers,
N = 50 time samples, and independent Rayleigh fading
with various constraints.
[2] B.M. Sadler, R.J. Kozick, T. Moore, “Performance
bounds on bearing and symbol estimation for commu¬
nication signals with side information,” Proc. ICASSP
2000, June 2000.
[3] P. Stoica, B. C. Ng, “On the Cramer-Rao bound under
parametric constraints,” IEEE Sig. Proc. Letters, vol.
5, no. 7, pp. 177-179, July 1998.
[4] J.F. Cardoso and A. Souloumiac, “Blind beamforming
for non-Gaussian signals,” I EE Proc.F, Vol. 140, No. 6,
pp. 362-370, Dec. 1993.
[5] P. Comon, “Independent component analysis, A new
concept?”, Signal Processing, vol. 36, pp. 287-314, 1994.
[6] J. Sheinvald, “On blind beamforming for multiple non-
Gaussian signals and the constant-modulus algorithm,”
IEEE Trans. Signal Processing, vol. 46, no. 7, pp. 1878-
1885, July 1998.
[7] A.-J. van der Veen, A. Paulraj, “An analytical constant
modulus algorithm,” IEEE Trans. Signal Processing,
vol. 44, no. 5, pp. 1136-1155, May 1996.
[8] A. F. Naguib, N. Seshadri, A. R. Calderbank, “Increas¬
ing data rate over wireless channels,” IEEE Sig. Proc.
Mag., May 2000.
[9] S. M. Alamouti, “A simple transmit diversity technique
for wireless communications,” IEEE J. on Selected Ar¬
eas in Comm., Oct. 1998.
77
ARRAY PROCESSING IN THE PRESENCE OF UNKNOWN NONUNIFORM
SENSOR NOISE: A MAXIMUM LIKELIHOOD DIRECTION
FINDING ALGORITHM AND CRAMER-RAO BOUNDS
Marius Pesavento Alex B. Gershman
Department, of ECE, McMaster University
Hamilton, Ontario, L8S 4K1 Canada
gershmanOieee . org
ABSTRACT
We address the problem of estimating Directions Of Arrival
(DOA’s) of multiple sources observed on the background
of nonuniform white noise with an arbitrary unknown di¬
agonal covariance matrix. A new deterministic Maximum
Likelihood (ML) DOA estimator is derived. Its implemen¬
tation is based on an iterative procedure which includes
stepwise concentration of the Log-Likelihood (LL) function
with respect to the signal and noise nuisance parameters
and requires only a few iterations to converge.
New closed-form expressions for the deterministic and
stochastic direction estimation Cramer-Rao bounds (CRB’s)
are derived for the considered nonuniform model. Our ex¬
pressions can be viewed as an extension of the well-known
results by Stoica and Nehorai, and Weiss and Friedlander
to a more general noise model than the commonly used uni¬
form one. Simulation and experimental (seismic data pro¬
cessing) results illustrate the performance of the estimator
and validate our theoretical analysis.
1. INTRODUCTION
ML DOA estimation techniques are known to have excellent
asymptotic and threshold performances [1], [2], The key
assumption used for the derivation of both the determinis¬
tic and stochastic ML estimators is the so-called uniform
white noise assumption [1], According to it, sensor noises
are presumed to form a zero-mean Gaussian process with
the covariance matrix a2I, where cr2 is the unknown noise
variance, and I is the identity matrix. This simple assump¬
tion enables to concentrate the resulting LL function with
respect to both signal waveform and noise nuisance param¬
eters, and, therefore, reduce the dimension of the parameter
space and the associated computational burden [1].
Apparently, the uniform noise assumption may be un¬
realistic in certain applications [3]-[6], where the noise envi¬
ronment remains unknown or changes slowly with time. In
the general case, the sensor noise should be considered as
an unknown colored (i.e. spatially dependent) process. Re¬
cently, several advanced ML techniques have been proposed
which exploit the ideas of colored noise modeling [6]-[8].
In some practical applications (for example, when the
so-called sparse arrays are used), the general colored noise
This work was supported by the Natural Sciences and Engi¬
neering Research Council (NSERC) of Canada.
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
assumption can be simplified by assuming the sensor noise
to be spatially white [4], [5]. In this case, the noise spatial
covariance structure still can be represented by a diagonal
matrix but the sensor noise variances are no longer identi¬
cal one to another. Such a noise model becomes relevant
in situations with hardware nonidealities in receiving chan¬
nels [9] as well as for sparse arrays with prevailing external
noise (for example, reverberation noise in sonar or external
seismic noise) [4], [5].
It is important to stress that if sensor noise is a spa¬
tially nonuniform white process, neither the conventional
“uniform” ML methods [l]-[2], nor the colored noise mod¬
eling ML techniques [6]-[8] may be expected to give satisfac¬
tory results, because the former methods will mismodel the
noise, whereas the latter techniques will ignore important
prior knowledge that the noise process is spatially white.
This appears to be a strong motivation to develop direc¬
tion finding techniques for the nonuniform white noise case.
Moreover, the majority of the ML colored noise modeling
based approaches developed so far are unable to concen¬
trate the LL function with respect to the noise parameters
[7], As a result, such techniques may be computationally
demanding. The use of the nonuniform white noise model
can be expected to overcome this drawback by means of
obtaining “concentrated” solutions to the ML estimation
problem.
The motivation given shows that the nonuniform white
noise case can be viewed as a practically important gener¬
alization of the simpler uniform model. In the present pa¬
per, we derive a new iterative deterministic ML estimator,
which concentrates the LL function with respect to both
signal and noise nuisance parameters. Unlike the analytic
concentration used in the conventional “uniform” ML esti¬
mators, the concentration of the LL function in the nonuni¬
form noise case will be performed in a numerical (iterative)
manner, with only a few iterations necessary for conver¬
gence.
Furthermore, we derive closed- form expressions for the
deterministic and stochastic direction estimation CRB’s for
the considered nonuniform white noise case. These expres¬
sions can be viewed as a natural extension of the well-known
results reported in [l]-[2] and [10] for the uniform noise
model. The estimation performance of the proposed ML
technique is compared to the derived CRB’s and the per¬
formance of the deterministic uniform ML estimator [1] via
78
computer simulations. Moreover, we test both the uniform
and nonuniform ML techniques using experimental seismic
data recorded by the GERESS array (Germany). Our sim¬
ulations and the results of real data processing demonstrate
essential performance improvements achieved by means of
the proposed nonuniform ML estimator relative to the con¬
ventional uniform ML algorithm. Additionally, the exper¬
imental results provide a solid verification of the practical
relevance of the considered nonuniform noise model.
2. SIGNAL MODEL
Let an array of n sensors receive q (q < n) narrowband
signals impinging from the sources with unknown DOA’s
01, ... ,8q. The ith snapshot vector of sensor array outputs
can be modeled as [l]-[3]
x(i) = A(6)a(i) + n(i) , i = l,...,N (1)
where A(9) — [a(0i), . . . , a(09)] is the n x q matrix com¬
posed from the signal direction vectors a(6t) (i = 1, . . . ,q),
0 = [0i, . . . , 09]T is the 5x1 vector of the unknown signal
DOA’s, s(i) is the q x 1 vector of the source waveforms, n(i)
is the nxl vector of white sensor noise, N is the number
of snapshots, and (-)T stands for the transpose. In a more
compact notation, (1) can be rewritten as
where ^ = [0T, crT , sT(l), . . . , sT(N)]T is the vector of
unknown signal and noise parameters, cr = [af, . . . , tr2]T,
x(i) = Q~1/2x(i), and A(9) = Q~1^2 A(6).
Introduce the n x N matrix
G = X - A(9)S = [ci, . . . ,cN] = [ri, . . . ,rn]T (8)
where Ci and ri me the nxl and Nxl vectors corre¬
sponding to the ith column and the ith row of the matrix
G, respectively. With these notations, from (7) it follows
that
= *l -Nlj Q-Xek (9)
where e* is the vector containing one in the fcth position
and zeros elsewhere.
Prom (3) and (9), we obtain that if the other parameters
are fixed, the ML estimate of the diagonal noise covariance
matrix is given by
Q = ■^diag{rfri,r2/r2,...,r"r„} (10)
Here, we exploit the following obvious property [ C]k,k =
rk of the matrix
X = A(0)S + N (2)
where X = [x(l), . . . , x(N)] is the nxN array data matrix,
S = [s(l), . . . , s(N)} is the q x N source waveform matrix,
and N = [n(l), . . . , n(N)] is the nxN sensor noise matrix.
The sensor noise is assumed to be a zero-mean spatially
and temporally white Gaussian process with the unknown
diagonal covariance matrix
Q = E{n{t)nH(t)} = diag {(?!,(?%, . ■ ■ ,<r*} (3)
In what follows, the signal waveforms will be assumed to
be either deterministic unknown processes [1], or random
zero- mean Gaussian processes [2], In particular, the signal
snapshots are assumed to satisfy the following models
x(i) ~ M(As{i),Q) (4)
x(i) ~ Af(0, R ) (5)
in the deterministic and stochastic case, respectively. Here,
R = E{x(i)xH (*)} = APAh + Q (6)
is the array covariance matrix, P = E{s(i)sw(i)} is the
source waveform covariance matrix, Af denotes the complex
Gaussian distribution, and (-)H stands for the Hermitian
transpose.
3. MAXIMUM LIKELIHOOD ESTIMATION
Under the assumption that the signal waveforms are deter¬
ministic unknown sequences, the LL function for the model
considered is given by [11]
n N
L{V) = -N^logo-fc - ||*(») - A(0)s(i)||2 (7)
k= 1 i= 1
N
C = ^CiC? (11)
i= 1
Inserting (10) into (7), we have
L(0,S) - -AT^Sogjirfr*} - (12)
fc=i t=i
Using (10)-(11) and the properties of the trace operator, we
obtain that
£cfQ 1cis
k= 1
trace | Q j
k= 1
trace |Q_1c| =nN (13)
Hence, after omitting the constant term (13), the LL func¬
tion (12) can be further simplified to
L(0,S) = -NY^log{jjrZrk} (14)
fc=i
At the same time, from (7) we obtain in a standard way that
if the remaining parameters are fixed, the ML estimate of
the matrix S is given by
S= (AH(0)A(0)) 1AH(0)X (15)
where X = Q~1/2X is the nx N transformed data matrix.
Note that the estimate (15) depends on Q, and, in turn, the
estimate of Q in (10) depends on S. Therefore, it appears
to be impossible to obtain any closed form expression of
the LL function concentrated with respect to the full set
79
of the signal and noise nuisance parameters. To avoid this
difficulty, we introduce the idea of stepwise concentration,
which was also exploited in [3] in an implicit form. The
essence of this idea is to concentrate the LL function in an
iterative manner.
Omitting the constant factor —N in (14) and inserting
(15) into this equation, we obtain the following alternative
expressions for the negative LL function
n
£(0) = 5>{^rfr*}
fc= 1
= trace log j ~ GGH j
= trace log {ip^(0)XXHP^(0)}
= trace log |Pj^(0) pj (16)
AH (0)A(6)y1 AH (8) and
Pj^(0) = I — P are the projection matrices. Here,
R = ±XXH (17)
is the n x n sample covariance matrix of the transformed
data.
It is important to stress that in the particular uniform
noise case ( Q = cr2I), the function (16) can be simplified
to
C(9) = trace log {P^(0) P} (18)
where
• Step 1. Set Q = I.
• Step 2. Find the estimate of 6 as 9 —
argming {£(©)} where the negative LL function
C(0) is defined by (16).
• Step 3. Using the so-obtained 0, compute S from
(15). Find the refined estimate of Q from (10)
using (8) and the previously obtained (fixed) S
and 0. Repeat steps 2 and 3 a few times to obtain
the final estimate of 9.
In step 1, the algorithm is initialized using the uniform
noise assumption. Under this assumption, the estimate of
Q should be written a s Q = a2 1 , where a1 is some estimate
of the noise variance a2 . However, from the structure of the
negative LL function (16) it follows that the minimizer of
this function does not depend on the value of a2. Therefore,
without loss of generality in step 1 we can set a2 — 1.
4. CRAMER- RAO BOUNDS
The following two theorems present closed-form expressions
for the deterministic and stochastic CRB’s under the nonuni¬
form noise assumption.
Theorem 1: The qxq deterministic CRB matrix for the
signal DOA’s is given by:
CRBdet00 = ^ {Re [(bHp\b) © PT] p (21)
where A = Q~1/2A, D = Q~1/2D, P = j? *(*)*(*)" »
O stands for the Schur-Haramard matrix product, and
where P^(0) = A(0) f
R=±XXh (19)
is the sample covariance matrix of the original data (1).
Interestingly, this function is not equivalent to the conven¬
tional negative LL function [1]
r da( i
da(9) |
da(6) 1 1
a.
II
CO
Si-
11
to
&
II
•Q
Proof: See [11].
Theorem 2: The qxq stochastic CRB matrix for the
signal DOA’s is given by:
C{0) = trace {P\{0) P} (20)
derived under the uniform noise assumption. The explana¬
tion of this fact lies on the basis of the observation that the
ML estimators (16) and (20) use very different types of a
priori information on the structure of the noise covariance
matrix.
Another important observation is that unlike (20), the
function (16) does not enable simultaneous concentration
with respect to both signal and noise nuisance parameters.
This fact can be explained by inspecting the structure of
(16). According to this equation, the estimate of the signal
DOA vector 9 depends on the estimate (10) of the matrix Q,
which, in turn, is dependent of the estimate of 0. To over¬
come this problem, instead of the analytic concentration ap¬
proach used for the derivation of the uniform ML estimator,
we propose the so-called stepwise numerical concentration,
which is given by the following iterative procedure:
CRBSTO00 = l{2Re[(pAWR-,Ap)
© (b^p^R-'by | -mtmt
where R = Q-1,2RQ- ' 1 ,/? and the real matrices
2Re j(iT1Ap)T© (i>"P^)J , (24)
{(R-pOR-1
-(P^P-1)*©(P^P-1)}“1 (25)
Proof: See [11],
It is interesting to compare the derived expressions with
the deterministic and stochastic CRB’s in the uniform noise
M =
T =
80
case. The latter two bounds are given by [1], [2], [10]
CRBdet(,0 = ^{Re[(X?ffP^)0PT]}_1(26)
CRBsto00 = ^{Be[(PAHR-1AP)
©(z?f/p^ir1r>)T]} 1 (27)
respectively.
The comparison of (21) and (26) shows that the nonuni¬
form deterministic bound (21) corresponds to the uniform
CRB (26), with the only difference that the nonuniform
CRB uses the transformed array manifold A instead of the
original manifold A. This transformation can be viewed as
a sort of preequalization of sensor noise1 . To explain the ef¬
fect of noise preequalization, let us consider the case when
some part of array sensors suffers from intensive noises,
whereas another part of sensors remains relatively “noise¬
less” . According to the above-mentioned manifold transfor¬
mation, the contribution of the noisy sensors to the CRB
(21) will be negligible because of relatively low weights as¬
signed to these sensors. This corresponds to our natural
expectation that the optimal (ML) algorithm derived for
the nonuniform model should be insensitive to the pres¬
ence of such noisy sensors. Such a robustness property is
achieved by means of blocking the outputs of corresponding
(noisy) array channels and exploiting only noiseless sensors.
Prom this point of view, the manifold transformation ma¬
trix Q~i/2 can be identified as a sort of blocking matrix.
As it can be seen from the comparison of (23) and (27),
in the stochastic case the relationship between the uniform
and nonuniform bounds becomes more complicated than
in the deterministic case. In particular, this relationship
cannot be described solely in terms of the manifold trans¬
formation Q -1/2. We observe that the bound (23) contains
an additional term -MTMt which does not appear in
(27). In the general case, we obtain that
Nonuniform CRBDETg^ = Uniform CRBDETgg
Q=<r^I
UNCOBRELATED SOURCES
10’ 10! 10s
NUMBER OF SNAPSHOTS
Figure 1: Comparison of the DOA estimation RMSE’s and
CRB’s. First example.
where B — diag{(w/c)di cos#i, (w/c)d2 cos#i, . . . , (u/c)dn
cos#i), p = jf | -s ( i ) | 2 , dk is the coordinate of the fc-th
sensor, uj is the central frequency, and c is the propagation
speed.
Assuming that the array has omnidirectional sensors,
the number of snapshots is high (p ~ p), and defining
the SNR as [5] SNR = (pln)aHQ~la = ( p/n ) £"=1 1 /ah
we obtain the following explicit relationship between the
stochastic and deterministic single-source bounds:
CRBstc>0@ — ^1 + rt snh ) CRBdet00 (28)
Hence, in the large sample case the difference between the
two bounds becomes small when the source is powerful
enough, so that nSNR » 1.
5. SIMULATIONS
Nonuniform CRBgTO qq
> Uniform CRB
STO 09
The proof of the last equation is given in [11].
Assume that there is only one signal source ( q — 1).
In this case, we have that A = a and D = d, where a —
Q-^a. Therefore, the array covariance matrix (6) can
be rewritten as R = paaH + Q, where p = E {|s(i)|2} is
the signal variance. It is easy to show that in this case the
bounds (21) and (23) can be simplified to [5]
CRBdet00
aH Q la
2 Np[aH Q~laaH B2Q~1a — (aHBQ_1o)2]
rRR = _ 1 + paH Q ]a. _
STOee 2Np2 [a11 Q~1aaH B2Q~1a - (a^BQ-'a)2]
1 Usually, the term prewhitening is used but this is somewhat
confusing to use it here because sensor noise has been originally
assumed to be spatially white.
We assumed a ULA of ten sensors spaced half a wave¬
length apart, and two equally powered sources with the
DOA’s 9\ = 7° and 62 = 13°. The nonuniform noise was
assumed to have the following covariance matrix: Q =
diag{10.0, 2.0, 1.5, 0.5, 8.0, 0.7, 1.1, 3.0, 6.0, 3.0). In all our
examples, the experimental DOA estimation Root-Mean-
Square Errors (RMSE’s) of the conventional uniform and
the proposed nonuniform ML methods have been compared
to the nonuniform CRB’s (21) and (23).
In the first example, we assume two uncorrelated sources
with the SNR = 10 dB. Fig. 1 displays the results versus
the number of snapshots. In the second example, two cor¬
related sources are assumed, with the correlation coefficient
equal to 0.9. The SNR = 15 dB is taken and the results are
plotted in Fig. 2 versus the number of snapshots.
From Figs. 1-2, we observe that uniform ML performs
poorly in the nonuniform noise case. As expected, the pro¬
posed nonuniform technique provides essential performance
improvements. In particular, it attains the stochastic CRB
81
CORRELATED SOURCES
Figure 2: Comparison of the DOA estimation RMSE’s and
CRB’s. Second example.
(23) even at small sample sizes. Since two iterations are
enough to guarantee the convergence, the computational
cost of our technique is comparable to that of conventional
ML.
6. EXPERIMENTAL RESULTS
To validate the practical relevance of the nonuniform noise
model, real seismic data were used. These data were col¬
lected by GERESS array (Germany). The data record of
the regional seismic event at an azimuth of 0 = 121.8° was
analyzed (see [12] for details). Note that the azimuth value
of this event was known in advance with a high precision.
Estimating this parameter using the methods tested, we
were able to compare their experimental performances.
The conventional and proposed ML methods have been
applied to azimuth-velocity (2D) estimation at the follow¬
ing four frequencies: fi = 0.9375 Hz, /2 = 1.25 Hz, f3 =
1.5625 Hz, and /4 = 1.875 Hz.
The experimental azimuth estimates have been used to
compute the experimental RMSE’s shown in Fig. 3. From
this figure, it is clearly seen that nonuniform ML has notice¬
ably better experimental performance than the uniform ML
technique. These results provide a solid verification of rele¬
vance of the developed nonuniform noise model in practical
applications.
REFERENCES
[1] P. Stoica and A. Nehorai, “MUSIC, maximum likeli¬
hood and Cramer-Rao bound,” IEEE Trans. ASSP, 37,
pp. 720-741, May 1989.
[2] P. Stoica and A. Nehorai, “Performance study of con¬
ditional and unconditional direction-of-arrival estima¬
tion”, IEEE Trans. ASSP, 38, pp. 1783-1795, Oct. 1990.
*
*
*
*
o
O
o
* UNIFORM ML
O NONUNIFORM ML (TWO ITERATIONS)
0.6 1 1.2 1.4 1.6 1.8 2
FREQUENCY (HZ)
Figure 3: Comparison of the DOA estimation RMSE’s.
Real seismic array data.
[3] J.F. Bohme and D. Kraus, “On least squares methods
for direction of arrival estimation in the presence of un¬
known noise fields”, ICASSP’88, NY, pp. 2833-2836,
Apr. 1988.
[4] A.B. Gershman, A.L. Matveyev, and J.F. Bohme,
“Maximum likelihood estimation of signal power in sen¬
sor array in the presence of unknown noise field,” I EE
Proc. RSN, F-142, pp. 218-224, Oct. 1995.
[5] A.L. Matveyev, A.B. Gershman, and J.F. Bohme, “On
the direction estimation Cramer-Rao bounds in the
presence of uncorrelated unknown noise,” Circ., Syst.,
Signal Processing, 18, pp. 479-487, 1999.
[6] J. LeCadre, “Parametric methods for spatial signal pro¬
cessing in the presence of unknown colored noise fields,”
IEEE Trans. ASSP, 37, pp. 965-983, July 1989.
[7] B. Friedlander and A.J. Weiss, “Direction finding using
noise covariance modeling,” IEEE Trans. SP, 43, pp.
1557-1567, July 1995.
[8] P. Stoica, M. Viberg, K.M. Wong, and Q. Wu,
“Maximum-likelihood bearing estimation with partly
calibrated arrays in spatially correlated noise field,”
IEEE Trans. SP, 44, pp. 888-899, Apr. 1996.
[9] U. Nickel, “On the influence of channel errors on ar¬
ray signal processing methods”, Int. J. Electron, and
Comm., 47, pp. 209-219, 1993.
[10] A.J. Weiss and B. Friedlander, “On the Cramer-Rao
bound for direction finding of correlated sources” , IEEE
Trans. SP, 41, pp. 495-499, Jan. 1993.
[11] M. Pesavento and A.B. Gershman, “Maximum-
likelihood direction of arrival estimation in the presence
of unknown nonuniform noise”, submitted.
[12] D.V. Sidorovich and A.B. Gershman, “2-D wideband
interpolated root-MUSIC applied to measured seismic
data”, IEEE Trans. SP, 46, pp. 2263-2267, Aug. 1998.
82
MATCHED SYMMETRICAL SUBSPACE DETECTOR
Victor S. Golikov , Francisco C. Pareja
Ciencia y Tecnologia del Mayab, A. C.
Calle 12, No. 199,dep.5, entre 19 y 21, Col. Garcia Gineres , C.P. 97070, Merida, Yucatan, Mexico
ABSTRACT
The optimal detection/estimation algorithms require large
computing expenditures in the radar, sonar and etc. The
paper presents the new Uniformly Most Powerful Test for
matched detecting of the symmetrical signal subspace. The
general (logical) shift operators group is used for
describing of the symmetry. This algorithm may be used
to reduce the complexity of matched detector for unknown
signal subspace and for a signal processing in real time.
The reduction brings appreciable hardware gains and a
small performance penalties in some radar systems. The
signal subspace model for moving-target indication in
radar is considered. We used the new approach for
creation of the sub-optimal detector with minimal
computing expenditures.
1. INTRODUCTION
subspace <H). Here PH is the projection x on subspace (H):
Ph=H(HtH)'Ht. (4)
The statistic y1 is a quadratic form in the normal random vector
x: N[|iH0, C^Ph]. It is known that ^/o2 is chi-squared distributed
with noncentrality parameter (p2/c^)Es> Es = 0THTH0: j^/o2:
X2P(|x2E s/o2).
The chi-squared distribution has a monotone likelihood ratio.
Therefore by the Karin-Rubin theorem, the test
l.X2/oSa2o
•Kx'/o2) = { (5)
o, yW<y\
is the Uniform Most Powerful (UMP) invariant detector for
testing Ho: p=0 versus Hi: p>0 in the measurement x: N[pH0,
o2!]. Further we will consider a subspace (H) as a symmetrical to
the group of generalized (logical) shift transformations. Further
we establish that statistic (3) is also maximal invariant to the
group transformation of general shift for symmetrical signal
subspace (H).
In signal detection problems, we assume that each
measurement is a sum of a signal component and a noise
component: x„ = ps„ + own ; n=0,l, ..., N-l.
The measurements are organized into a N-dimensional
measurement vector x= ps + aw; ( 1 )
where vector ps contains samples of the signal to be detected
and the vector aw contains samples of the added noises. We
assume that the noise vector w is draw from a multivariate
normal distribution w: N[0,I]. This means that the measurement
x is drawn from a multivariate normal distribution x: N[ps, a2!].
In some systems it sometimes happens that the signal s in the
measurement model x: N[ps, a2!] is a linear combination of
modes or basis vectors, in which case it may be represented as
N- 1
s=^enhn=m. Here H is a known N X N matrix with
71—0
columns h„ and 0 is a unknown NX 1 vector with elements 0n:
A
s=[hoht ... hfj-i] : . (2)
&N- 1 .
Let the mode matrix H is known but the mode weights are
unknown. In this case, the signal is known to lie in the linear
subspace (H) spanned by the columns of H, but its exact location
is unknown because 0 is unknown. We would like to test Ho-' p=0
versus Hi: p>0 when x is distributed as N[pH0, a1!] and 0 is
unknown. It is known [1] that the statistic y2 = xTPHx (3)
is a maximal invariant to the group of transformations that adds
a bias from the orthogonal subspace (A) and rotates in the
2.DESCRIPTION OF THE SYMMETRICAL
SUBSPACE
The operation t©x is called generalized (logical) shift in an
71
argument t on a value x, where x,te [0,N-1 ] , t = ^ f p m P_1 ,
p= 1
T = ZTPmP~' =ZcPm/”1 ’
p= l p= l p= l
Cp=((tp+xp))m - residue (mod m) and Cp,tp,Tps [0,m-l], N=m". Let
g(h) denote the operator of a generalized shift [2], We represent a
discrete mode of a signal as a column vector h=(hn h] . . . hn.i)T.
The generalized shift operation can be represented as
permutation of coordinates of this vector. It is possible to
represent the operators of generalized shift by block cyclic
matrixes of permutations. The matrix gje G is a matrix of
permutation, therefore one unit is equal to each of its rows and in
each column there is only a singular 1, all of the remaining
numbers are zero. Let (H ) be a symmetrical subspace. Then hj =
gihk, i=l©k; i,l,ke [0.N-1 ]. Therefore symmetrical matrix H
may be written as
»=[ h gA - (6)
The subspace (H) is called symmetrical, if transformed mode
h;6 H by group G also belongs to subspace (H), but mode has
another value of the parameter i: : gh, = hreH, i,re [0,N-1],
Note, that g is orthogonal matrix: ggT=I. We have the following
representation for the operator g: gj = VHWiV,
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
83
where V = NT1/2 [Had(t,t)] , Had(t,x) = exp[j2n/m ^'tiXi ]„
i=l
j=V=T, W, = diag[Had(i,T)], VHV=WH=I. We simplify our
notation by written (VT)* as VH, where T is sign of transposition,
* - sign of the complex conjugate. Eigenvector of generalized
shift operators are the full orthonormalized systems of
Hadamard-Chrestenson functions:
n n
Had(p,t)=exp[j2jt/m^pIfI- ], p = '^pim,~l ,
!=1 i=l
n
• (?)
1=1
At m=2 they are called Walsh functions, at m=N they are called
discrete exponential functions.
The matrix H is block cyrculant matrix and may be written as
H = VhAV = gHg, foranygeG, (8)
then Ph=H(HtH) Ht= VHflV, (9) where
A = diag(Ao ... XN.(), Xj - eigenvalue of matrix H ,
£2 = diag(Eoei ... En_i), £i- eigenvalue of matrix PH. The terms of
the diagonal matrix A are a Hadamard-Chrestenson
Transformation of the fist column h of matrix H. Similarly the
terms of the diagonal matrix £2 are a Hadamard-Chrestenson
Transformation of the fist column of matrix PH.
2. NEW DETECTION ALGORITHM FOR
SYMMETRICAL SIGNAL SUBSPACE
The sufficient statistic for the parameter p is (3)
X2 = xTPHx.
The operator PH is block cyrculant matrix and it may be written
as
P„ = H(HTH)lHT=gPHg. (10)
Now we will establish that statistic (3) is a maximal invariant to
the group transformation of general shift G = {g: g(x) = gx =
VHWVx } under condition (6,8,9). It is clear that:
1- (gx)TPHgx = xTPHx. (11)
2. (x,)tPhx, = (x2) tPhx2=Kx,)t Vh£2Vx, = (x2)TVH£2Vx2
=>(Xi)Tt2Xi = (X2)t£2X2=> 1 1 X,£21/21 1 2 = 1 1 X2£21/2i | 2 => x, = gx2>
(12)
where Xi = Vxi and X2 = Vx2 is a Hadamard-Chrestenson
transformation of x, a sign 1 1 1 1 is Euclidean norm. The maximal
invariant may be written as
N- 1
wp= (1/V77 )'£hj Had*(p,i) (13)
<=0
The statistic (3) requires N2 multiplication operations and N2
addition operations. The new statistic (1 1) requires N
multiplication operations and N2 addition operations for m=2. In
this case it is used Walsh Transformation instead of Hadamard-
Chrestenson transformation.
The statistic (3) has not performance penalties if the signal
subspace is symmetrical.
But exact symmetry in signal subspace exists seldom for
real signal model. Let consider N the continuous time
cosinusoids of the form Aicos((0jt + tpj) are summed to
produce the signal s(t). If this signal is sampled at the
sampling instants t=nT, then the discrete time signal is:
w-i
sn= 7, Aj cosjCQjTn + <Pj) .
i=0
Typically, such samples are taken over an interval [0<t<NT] to
produce the samples vector s = [s0 s* ... sn.j]t. The
vector of samples s may be written as s = Re H0,
where H = [ho h, . . . h,,.,], 0= [0o0i ... 0N.i]T,
hj = [1 exp(jt0jT ... exp(j(0jT(N-l)) ]T, 0; = A,exp(j <)>;),
cc»i= CDoi. We assume that s is an N-vector that is constructed from
a linear combination of linearly independent cosines and sines,
provided T=l, coj = (27t/N)i, ie[0,N-l]. The mode hj is a
complex exponential mode and HHh=NI. The algorithm (11)
consists of two parts: coherent detector
N-l N-l
yk= (1/V^7 )7[wp (l/V^V )YjX. Had*(p,i)]Had(k,p) } (14)
p= 0 i=0
N-l
and energy detector x2 = 7 (yk )2
*= o
written as
. The test (3) may be
X2 = xtPhx = xtH(HtH)'Htx
(15)
where t = HTx, e = 1 1 hi 1 2 (16)
The known algorithm (15) and obtained algorithm (11) have
difference in their coherent detector (14) and (16). We compare
signal-to-noise ratio (SNR,) for test (14) and (SNR2) for test (16)
for each mode of H. Let Zk = [(SNR)1]k/[(SNR)2]|t denote factor
of noise immunity loss. [(SNR),]k may be written as [(SNRhL =
Wk
- , and [(SNR)2]k may be written as [(SNR)2]k = pN/o. Then
<7
factor of noise immunity loss Zk = pyk/pN. (17)
It is plotted for N=64, M=2, (no =1 in Figure 1. This curve may
be used to compute the effective loss in SNR that results from
not existing exactly symmetry in a subspace (H) to the dyadic
shift group.
This implementation of coherent detector has N operations of
multiplication and N2 operations of addition. The structure of
implementation for known test t = HTx consists of N branches.
Each branch is a correlator of transformed data with stored
modes. Therefore known test structure has N2 operations of
multiplication and N operations of additions. The advantage of
new algorithm is obvious. The implementation is hardware-
efficient, but it is sub-optimum.
The accuracy of the symmetry in subspace (H) defines the noise
immunity of this algorithm.
84
5. REFERENCES
Figure 1 Signal-to-noise effective loss versus mode for m=2 and
N=64,(flb=l.
[1] Louis L. Scharf. Statistical Signal Processing: Detection,
Estimation, and Time Series Analysis. Addison-Wesley,
1991.
[2] V.Golikov. “ The Theory of Optimal M -ary Interperiod
Processing when Detecting Fluctuating Signals
on a Background of Correlated Interference and Noise”,
Radioelectronics and Communications System, vol. 31,
pages 2-6, April, 1988.
Figure 2 Implementation of the coherent detector
for symmetrical signal subspace
The accuracy of the symmetry in subspace (H) defines the noise
immunity of this algorithm. In our case the noise immunity losses
smaller than 3 dB (0.5 + 1) for half of the modes. Our researches
have shown that this relation is saved at increase N. Note that
when symmetry in subspace (H) is not exact, SNR for some
modes may be maximized by choosing h0(tOo). It is illustrated in
Figure 3 for m=2, N=64 and <»o=1.3. In this case another some
modes have much more SNR than for oio=l (Fig.l). Note that it
is possible to change the type of symmetry in this problem. We
can choose m=3,4,5 . But if increasing of m the complexity
of test is increased.
4. SUMMARY
The new algorithm for matched symmetrical subspace detector
has been presented. It may be used to reduce the complexity of
known algorithm for the signal subspace detection. High quality
performance is obtained for moving-target indication under
unknown Doppler frequency. The used the new approach for
creation of the sub-optimal detector with minimal computing
expenditures.
85
MULTIPLE SOURCE DIRECTION FINDING WITH AN ARRAY OF M SENSORS USING
TWO RECEIVERS
E. Fishier and H. Messer
Department of Electrical Engineering-Systems, Tel Aviv University,
Tel Aviv 69978, Israel,
E-mail: : {eranf,messer}@eng.tau. ac.il.
ABSTRACT
Multiple source direction finding algorithms (e.g., MUSIC )
are applied on simultaneous measurements collected by M
sensors. However, practical considerations may dictate us¬
ing less receivers than sensors, such that the measurements
cannot be collected simultaneously. In such cases, data is
collected sequentially from the different array elements in
a process which is referred to as ’’time varying preprocess¬
ing”, or ’’switching”.
In this paper we study multiple source direction finding
(DF) with an array of M >2 elements, where only two
receivers are available.
1. INTRODUCTION AND PROBLEM
FORMULATION
Direction finding with fewer receivers than sensors via time
varying processing is a very important issue (e.g., [3]). In
many practical scenarios the number of receivers is con¬
siderably less then the number of sensors. Moreover, the
tendency is to use the minimum number of receivers possi¬
ble which maintain spatial capacity, i.e., only two receivers.
Reducing the number of receivers results in a cheaper and
simpler design, in the cost of a reduced performance. In this
paper we investigate the multiple source localization perfor¬
mance from the identification point of view. We first find
how many sources can be localized with only two receivers
and and then we suggest a computationally efficient algo¬
rithm to perform this task.
Assume q far-field narrow band sources impinging on
an array with p > q sensors from directions [Q\, . . . ,9q}.
Using complex signal representation, the vector of received
signals can be written as:
x(f) = A(0)s(f) + n(f) (1)
where s (t) is the complex envelope of the slowly varying
signals, n(f) is the additive noise, 0 is the vector of direc¬
tions of arrival, and A(0) = [a(0]), . . . , a(0,)] where a (6)
is the array steering vector at direction 9. We denote by
[x(f)]j the *-th element of vector x(f).
Under the standard assumptions about the noise being
Gaussian and white and of the signals being Gaussian, the
correlation matrix of x(f), denoted by Rx (0), is given by:
Rx{0) = A(0)RsAh(0) + a2l (2)
where (• )H denote the complex conjugate transpose opera¬
tion, zr2 is the noise level and Rs is the signal covariance
matrix.
The problem of estimating 0 from a set of N snapshots
of the array, x(£j), . . . , x(£jv), is usually refereed to as the
localization problem. The case of spatial samples which are
time dependent linear transformation of the array output is
discussed in [3], The resulting model for the measurements
is y{ti) = G(f,)x(fi), where G (ti) is the time dependent
linear transformation. Note that G(£;) is a matrix in which
the number of rows is the number of receivers used at time
We are interested in the special case where G(<j) is a
2 x p matrix such that each row is a vector with all elements
but one equal zero, where the non zero element equals 1.
Without loss of generality, we assume that we take N snap¬
shots of each sub array of two elements. The total number of
snapshots is L = (£ )N. At time instant £*, i = 1, ..., L, the
output of the reduced array is: y (ti) = [[x(fj)]fc [x(f,)];]T
for some k ^ l € {1, . . . ,p}
2. SOME RELATED RESULTS
In [3] the M L estimator for a general transformation matrix
G (ti) is presented. This procedure involves maximization
over all unknown parameters: G,a2,Rs . This maximiza¬
tion problem becomes extremely difficult even for as little
as two sources. The authors presented an ad-hoc approach,
the GLS, which reduces the complexity of the estimator to
a search over only q parameters-.
Alternatively, by noting that our problem can be mod¬
eled as a problem of direction finding with time-varying ar¬
ray, one can apply the results of [4] which include, among
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
86
others, expressions for the Cramer Rao lower bound ( CRLB )
on the estimation error of the unknown parameters. Also,
some conjectures about the complexity of the ML estima¬
tor were presented which suggested that in the general case
the ML estimator is not separable.
In [2] it is shown that, unlike the case where the ar¬
ray is sampled simultaneously, in cases where the number
of sensors in the sub-array is smaller than the number or
sources, the CRLB for 0\ , . . . , 0q does not approach zero
as the SNR approaches infinity, so the time varying spatial
sampling process causes a residual estimation error.
Eigenvector based methods for the case of time- varying
arrays had been proposed in [1], In this paper two possi¬
ble eigenvector based method have been proposed. One is
based on an interpolating matrix and the other is based on
a focus matrix. However, both methods can not be applied
to our problem due to the large differences in the steering
vectors between successive time instances.
3. THE IDENTEFABILITY PROBLEM
It is well known that when the array is simultaneously sam¬
pled so (1) holds, and under some very weak conditions on
the array, one can localize up to p — 1 sources. Is it also true
when only two receivers bare used? The following theorem
refers to this question:
Theorem 1 Using an array ofp sensors and only two re¬
ceivers, up to q = p-1 narrowband sources can be uniquely
localized.
Proof 1 Let y(L) be a column vector with 2(^) elements,
given by:
y(U) = [y{ti)T ,y{ti+N) ,y(ti+2N)T ,■ ■ ■ ,y(U+L-N) ]
(3)
Without loss of generality, assume that first we take N sam¬
ples of the first and second sensors simultaneously. Next
we take another N samples from the first and third sensors
simultaneously, and so on. y(£i) is a column vector, with
the first two elements equal to the first sample of the first
two sensors sampled. The third of forth elements ofy(ti)
are the two elements of the first sample from the second and
third sensors, and so on. It is clear that {y(t;))i=i con¬
tain all the available samples and thus it contains all the
statistical information on the unknown parameters.
It can be easily verified that {y^i)}^ are i.i.d. com¬
plex Gaussian vectors with block diagonal correlation ma¬
trix, Ry(9), given by
0
[Rx(9)]ki
[RM)ik
[Rx{Q)\kk
, [R* (£)]«
I* - j\ > 1
i > j
i<j
* = iiJ ^ §
o.w
(4)
where k and l are the first and second sensors sampled at
the [|] switching. It is clear from the structure of Ry that
a simple one to one mapping, denoted by 'ip(R.x), between
Rx and Ry, exists
Let 9 = [9i,...,6k] and (f — [9\ , . . . , 9w] be two sets
of bearings, such that k,k' < q — 1 and O' f 9. For the
case of simultaneous sampling up to q — 1 sources could be
uniquely localized, i.e., Rx (9) f Rx(9') for every 9 f- O'.
Now, using the fact that ip is a one to one mapping between
Rx and Ry, it is clear that Ry(0) f Ry ((f) for every 9 f
Of.
In addition, since y(L) is a complex Gaussian vector,
the p.d.f. of y(ti) given 9 is different from the p.d.f. of
y(U) given (f, which is a sufficient condition for identefia-
bility.
This theorem provides a very important result: at each
time instant we are sampling a sub array of size two which
in turn enable us to localize only one source. However, co¬
herently combining all the results from the sub arrays, en¬
ables one to localize p-1 sources, the same number as if
we were sampling the all array with p receivers.
4. EIGENVECTORS BASED METHODS
The ML estimator for 0 requires a q dimensional search,
at least. Eigenvector based methods, like the MUSIC, of¬
fers a way to reduce the complexity to a one dimensional
search. This reduction in complexity is crucial since, still
today, with the most advanced DSP, searching in more than
two dimensional space can not be performed in real time.
We next describe a new eigenvector based procedure
which can be used in our problem. We start with the fol¬
lowing equivalent description of the data:
Let z (L) be a column vector with p elements. Let all
the elements be equal zero except, say the k and l elements,
which are equal to [x(£j)]fc and [x(L)]/, respectively. That
is, k, l are the two array elements which are sampled at time
ti. Now, denote by Rz = z(fi)zH(^) the em¬
pirical correlation matrix, it can be shown that its expected
value is given by:
Rz = A(0)RsAh (9) + <t2I + A (5)
where A is a diagonal matrix whose diagonal entries are
(p - 1) • diag(A(6)RsAH (0) + o2l). The matrix A is the
only difference from the mean of the sample covariance ma¬
trix in the case where the array is sampled simultaneously,
where eigenvalue based methods are easily applied, and the
mean of the sample covariance matrix where only two re¬
ceivers are used simultaneously.
However, if all the elements of the diagonal matrix A
are equal, then eigenvector based methods for estimating 0
can still be used, since it is just added to the noise covariance
87
matrix so it effectively changes the (unknown) noise level.
There are two sufficient conditions for all the elements of A
to be equal:
1 . All sources are uncorrelated, so Rs is a diagonal ma¬
trix.
2. All the array elements are omnidirectional, such that
|a,(0)| — |aj(#)| Vi ^ j and for any 6.
However, since these conditions are rarely fully fulfilled
in practice, MUSIC like procedures cannot be applied on
Rz directly.
A careful examination of Rz(0) and of Rx(0) shows
that their off-diagonal elements are the same, while the di¬
agonal elements of Rz{0) arep— 1 larger, V0. We therefore
suggest a non-linear pre-processing procedure: to divide the
diagonal elements of Rz by p— 1. Denote by Rz the result¬
ing matrix, it can be easily verified that E{Rz} = Rx (8)
and thus Rz can be used with all the eigenvector based
methods, e.g MUSIC. We refer to the MUSIC with the
suggested preprocessing as MMUSIC. Naturally, the per¬
formance of the MUSIC and of the MMUSIC applied to
the same array will be different, since only the first moment
(the expected value) of Rx and of Rz is the same.
This method can be extended to cases where the number
of samples taken from each sensor is not equal. Let n; be
number of samples taken at the i-th switching. Let Rz —
z It can be verified that the mean of Rz is
given by:
E{flz} = (A(6>)i?sA(6»)" + (72I)0^ (6)
where (\P)y is the total number of snapshots taken from
the i,j sensors simultaneously, is the total number of
snapshots taken from i-th sensor, and 0 denotes element by
element matrix multiplication. The suggested preprocessing
in this case is to divide each element of Rz by the corre¬
sponding element in \&. The resulting matrix, denoted again
by Rz , can be used with any eigenvalue based method.
5. SIMULATION STUDY
Consider a uniform linear array with 4 omni-directional el¬
ements. Assume two equi-power, partially correlated (p =
0.25) sources at bearings 0°, 15° and N = 100. In Figure
1 a typical spectrum of the MM U SIC is shown. For com¬
parison, we show a typical spectrum of the MUSIC which
has been applied on Rz without preprocessing. It shows
that without preprocessing the two sources are not resolved,
so, as predicted, the MUSIC cannot be used directly for
multiple source localization.
The MUSIC and MMUSIC norrftalized cost functions
Figure 1: Typical MUSIC and M MU SIC cost functions
We now present results of a simulation performance study
for the same experiment. Figures 2 and 3 depict the proba¬
bility of detecting two sources and the M SE of the bearing
of the first source, respectively, for various correlation co¬
efficients, as a function of the SNR. These results are based
on averaging of 1000 Monte Carlo Runs.
Figure 2: The probability of detecting two sources as a func¬
tion of the SNR.
Figures 4 and 5 depict the probability of detecting two
sources and the MSE of the bearing of the first source, re¬
spectively, as a function of the number of snapshots, where
the SNR is fixed at 10 dB.
Generally speaking, this study suggests that the perfor¬
mance of the MMUSIC improves as the SNR increases,
as the number of snapshots increases and as the correlation
between the sources decreases. However, our future work
will focus on analytic performance analysis of the algorithm
88
Figure 3: The MSE of the bearing of the first source as a Figure 5: The MSE of the bearing of the first source as a
function of the SNR. function of the number of snapshots.
so its inherent limitations can be exploit.
Figure 4: The probability of detecting two sources as a func¬
tion of the number of snapshots.
Fewer Receivers via Time- Varying Preprocessing”,
IEEE Trans, on SP. Vol. 47, pp. 2-10, January 1999.
[4] A. Zeira and B. Friedlander, “Direction Finding with
Time Varying Arrays”, IEEE Trans, on SP. Vol. 43, pp.
927-938,1995.
[5] M. A. Doron, A. J. Weiss and H. Messer, ’’Maximum
Likelihood direction finding of wide band sources”,
IEEE Trans, on SP. Vol. 41, pp. 411 - 414, 1993.
6. REFERENCES
[1 ] B. Friedlander and A. Zeira, “Eigenstructure-based al¬
gorithms for direction finding with time-varying ar¬
rays,” IEEE Trans, on AES. Vol. 32, pp. 689 - 701,
April 1996.
[2] J. Sheinvald, “On Detection and Localization of Mul¬
tiple Signals by Sensor Arrays,” Ph.D. Dissertaion, Tel
Aviv University, Israel.
[3] J. Sheinvald and M. Wax, “Direction Finding with
89
SELF-STABILIZED MINOR SUBSPACE EXTRACTION ALGORITHM BASED
ON HOUSEHOLDER TRANSFORMATION
K. Abed-Meraim, S. Attallah**, A. Chkeif, Y. Hua***
* Telecom Paris, TSI Dept. 46, rue Barrault, 75634, Paris Cedex 13 France.
** Centre for wireless communication, National University of Singapore, Singapore.
*** The University of Melbourne, Elec. Eng. Dept. Parkville, Vic. 3052, Australia.
E-mails: abed, chkeif@tsi.enst.fr, cwcsa@leonis.nus.edu.sg, yhua@ee.mu.oz.au.
ABSTRACT
In this paper, we propose an orthogonalized version
of OJA algorithm (OOJA) that can be used for the
estimation of minor and principal subspaces of a vector
sequence. The new algorithm offers, as compared to
OJA, such advantages as orthogonality of the weight
matrix, which is ensured at each iteration, numerical
stability and a quite similar computational complexity.
1. INTRODUCTION
Principal and minor component analysis (PC A and MCA),
which are part of the more general principal and minor
subspace (PSA and MSA) analysis, are two important
problems that are frequently encountered in many in¬
formation processing fields.
Let {r(fc)} be a sequence of IV x 1 random vec¬
tors with covariance matrix C = i?[r(fc)rT(A;)]. Con¬
sider the problem of extracting the principal or the mi¬
nor subspace spanned by the sequence, of dimension
P < N, assumed to be the span of the P principal
or minor eigenvectors of the covariance matrix, respec¬
tively. To solve this problem, several subspace extrac¬
tion algorithms have so far been proposed [l]-[5). The
minor subspace extraction algorithm of Oja et al. [4]
can be formulated as
W(z + 1) = W (i) - 0 [r(t)yT(t) - W(i)y(i)yT(i)]
= W(i) - 0p(i)yT{i) (1)
-W(z)y(i)yr(i)] . (2)
The discrete-time update of (2) suffers from a marginal
instability similar to the PCA ( P = 1) algorithm in [2],
Recently, a novel self-stabilizing MSA algorithm given
by
W(* + l) = W(t)-/?[r(i)y(f)TWT(i)W(i)x
WT(i)W(i) — W(i)y(i)yT(i)] , (3)
has been proposed by Douglas et al. in [3].
2. ORTHOGONAL OJA
Our algorithm consists of (1) plus an orthogonalization
step of the weight matrix to be performed at each it¬
eration. Orthogonality is an important property that
is desired in many subspace based estimation methods
[6]. To this end, we set (using informal notation):
W(» + 1) := W(* + l)(WT(i + 1) W (i + 1))~V2 (4)
where (WT(t 4- l)W(t 4- l))-1/2 denotes an inverse
square root of (WT(i 4- l)W(z 4- 1)). To compute the
latter, we use the updating equation of W(t+ 1). Keep¬
ing in mind that W ( i ) is now an orthogonal matrix, we
have
WT(i+l)W(i+l) = l4-/?2||p(i)||2y(z)yT(i) = I+xxr,
where W (i) € HNy P is the minor subspace estimate,
y(0 = WT(i)r(z), p(i) = (r(i) - W(i)y(i)), and 0 >
0 is a learning parameter. Reversing the sign of the
adaptive gain, i.e., replacing —0 in (1) by +0, yields
a principal subspace extraction algorithm. Chen et al.
have proposed a novel MSA algorithm [5] which can be
written as follows
where we have used the fact that WT(i)p(i) = 0, 1 is
the identity matrix, and x = j8||p(t)||y(i). Using
(I + xxr)-V2 _ ! + (
v/T+l
-1)
XX
||2 >
we obtain
W(i + 1) = W(t) - 0 [r(i)y(t)TWT(z)W(i) (WT(z + l)W(i + 1))"1/2 = I + r(i)y(i)yT(z), (5)
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
90
where r(i) ^ - — ( — ■ . — 1). Sub**
where r{t) ||y(0||2 ^ ^ + /jaj|p(OII3||y(OHa
stituting (5) into (4) and using the updating equation
of W(i + 1) leads to
W(t + 1) = (W(t)-/3p(t)yT(t))(I + r(t)y(t)yr(t))
= W(i) - f3p(i)yT(i), (6)
where p(i) = — r(i)W(i)y(*)//3+(H-r(*)||y(*)ll2)p(*)>
Thus, the algorithm can be written as
• Initialization of the algorithm:
W(0) = any arbitrary orthogonal matrix.
• Algorithm at iteration i:
y(*)
=
WT(i)r(i)
z(i)
=
W(t)y(i)
p(*)
=
r(i) - z(t)
=
1
v'i+/32llp(*)ll2lly(*)ll2
r(i)
=
<t>(i) - 1
l|y«ll2
P(0
=
-r(i)z(i)//3 + <f>{i)p(i)
W(i + 1)
=
W(i) — Pp{i)yT{i)
In order to gain :
more
insight into OOJA algorithm
we must examine the following points:
1. Minor subspace:
In terms of orthogonality errors, 00 JA algorithm
guarantees the orthogonality of the weight matrix
at each iteration. With the orthogonalization the
three algorithms (I), (2), and (3) become identi¬
cal. However, simulation results show that the
discrete-time update of OOJA algorithm is sen¬
sitive to the propagation of rounding-off errors.
Fortunately, we can overcome this problem by re¬
formulating the algorithm equations as shown in
section 3.
2. Principal subspace:
With respect to subspace errors, our algorithm
converges at the same rate as (1). In terms of or¬
thogonality errors, it guarantees the orthogonal¬
ity of the weight matrix at each iteration, whereas
(1) converges to an orthogonal weight matrix only
asymptotically. Finally, it is worth noting that
(3) quickly diverges for PSA.
3. Computational complexity:
The computational complexity of algorithms (3)
and (2) are 7NP + 0(N) and 5 NP + 0(N ) flops
per iteration, respectively. OOJA and (1) cost,
however, only 3NP+0(N) flops per iteration. It
is interesting to note that the orthogonalization
step does not increase the computational cost of
OJA algorithm. On the other hand, the updating
equation of the weight matrix of OOJA algorithm
has a more compact form than (2) and (3), i.e.,
it uses only one outer product instead of two for
(2) and (3). This turns out to be useful when
a subspace extraction algorithm is cascaded with
other adaptive algorithms, e.g., [8].
4. Convergence:
The convergence of OOJA algorithm follows di¬
rectly from that of OJA algorithm [7]. In fact, (6)
can be rewritten as W(i+1) = W(i)— f)p(i)yT(i)+
„0(/32). Therefore, for < 1, it can be shown
that the two algorithms have the same conver¬
gence performance.
On the other hand the convergence proof of (3) is
not complete. Effectively, to prove that span[W]
converges to span[E2], where E2 is the minor P-
dimen- sional subspace spanned by the eigenvec¬
tors corresponding to the P smallest eigenvalues,
Douglas et al. [3] have used the following assump¬
tion:
If all the eigenvalues of M (t) have negative real
parts, then for the following system
^*)=M(£)Q(t),
we have
lim Q (t) = 0.
t-+oo
This assumption is true if M(£) is time invariant
but not always true when M(f) is time variant as
shown by the counter examples given in [10, 11].
3. IMPLEMENTATION USING
HOUSEHOLDER TRANSFORMATION
Because of the numerical instability of OOJA when
used for minor subspace estimation, we propose here
another implementation of the algorithm based on House¬
holder transformation. In fact, the new implementa¬
tion can be derived from a reformulation of (6) in terms
of Householder transformation. We have the following
result:
Proposition 1 Let u(i) = p(i)/||p(j)[|. Then equa¬
tion (6) can be rewritten as
W(t + 1) = H(t)W(i) (7)
91
where H(i) is the Householder transformation given by
H(i) = I — 2u(i)uT(t)
Based on this result (see appendix for proof), the
new implementation consists in computing successively
y(t), p(i), r(t), and p(t). Then, we compute
u(») = p(0/IIp(*)II
v(i) = WT(i)u(t)
W(* + l) = W(i) - 2u(t)vT(i)
new implementation is numerically stable.
Example 3: We consider here the same context as
in the previous examples. By reversing the sign of p ,
we extract now the principal P-dimensional subspace.
In (9), we replace Ei by E2 and vice versa. As we can
see from figure 3, our algorithm (without Householder
implementation) is numerically stable and has better
performance than (1), (2), and (3).
5. CONCLUSIONS
Since the decomposition of the weight matrix involves In this paper; we proposed an orthogonal OJA (OOJA)
the use of numerically well-behaved Householder or- algorithm that can perform both PCA and MCA by
thogonal matrices (see [9] pp.209-213), OOJA becomes simply switching the sign of the same learning rule,
numerically very stable. The new implementation presents We gave tw0 fast implementations of OOJA where the
now a computational complexity of 4 NP + O(N) flops orthogonality of the weight matrix is ensured at each
per iteration. iteration. OOJA is numerically stable and its compu¬
tational complexity is smaller than those reported in
4. SIMULATION RESULTS [3] and [5].
Example 1: In this example, we choose r (i) to be a
sequence of independent jointly-Gaussian random vec¬
tors with covariance matrix
/ 0.9 0.4 0.7 0.3 \
0.4 0.3 0.5 0.4
0.7 0.5 1.0 0.6
V 0.3 0.4 0.6 0.9
(8)
P = 2, P — 0.01, and as recommended in [5] W(0) =
D, where Dtj = S(j — i). As in [3], we calculate the
ensemble averages of the performance factors
P(i) = ) _ - ~ - r. (9)
ro tr (W-T (i)E2 * E2rWr(i)J
riH) = ^-f:i|W?(0Wr(0-I|ft, (10)
where the number of algorithm runs is ro = 100, r indi¬
cates that the associated variable depends on the par¬
ticular run, ||.||f denotes the Frobenius norm, and Ei
(respectively E2) is the principal ( N - P) -dimensional
subspace (respectively minor P-dimensional subspace) .
Figure 1 compares the performance of OOJA (without
Householder implementation) with (1), (2), and (3).
As we can see our algorithm behaves better than (1)
and (2), but still suffers from numerical instability.
Example 2: In this example all parameters are
kept the same as in the first example. Figure 2 shows
the performance of Householder-based OOJA algorithm
as compared to (1), (2), and (3). We can see that the
6. APPENDIX
Proof of proposition 1: Using the definition1 of y we can
write pyr = prTW. By decomposing the observation
vector as:
r = WWTr + (I - WWT)r
= Wy + p
—P \— r__. — r
= — ItWj,+tt
we can write
PVTW = ^pLlWy+^j W
T P P .
fjr
= ^P ~ Wy + (l + r||y||2)p W
= — PPTW.
T
where the second equality comes from the fact that
pTW = 0. Finally, we obtain W(i+1) = (I+^jyp(i)p(i)T)W
To complete the proof we have to show that
a 2 _2 _2 T
— = jj=jj2 or equivalently ||p||2 =
Using the definition of p and the equality 1 + r||y ||2 =
(1 + /J2||p||2||y||2)-1/2, we can write
l|p|1 ^ i + /32liPll2llyll2
lHere, we omit the time index t to simplify the notations.
92
1
1
(a)
iW
P2
1
m?
—2t
P2
r(i-
/?2||y||2Vi I + ^IIpIIW
((r||y||2)2 + l-(l+r||y||2)2)
)
□
10* 10* 10* 10* to4 10*
number of Iterations
Figure 1: Average behaviors for MSA.
number of Iterations
(a)
Figure 2: Average behaviors for MSA using
Householder-based implementation.
7. REFERENCES
[1] Y. Hua, Y. Xiang, T. Chen, K. Abed-Meraim, and Y.
Miao, “A New Look at the Power Method for Fast Sub¬
space Tracking” , Digital Signal Processing, Academic
Press, Oct. 1999.
(a)
Figure 3: Average behaviors for PSA.
[2] T. P. Krasulina, “Method of Stochastic Approxima¬
tion in the Determination of the Largest Eigenvalue of
the Mathematical Expectation of Random Matrices,”
Automat. Remote Contr., vol 2, pp. 215-221, 1970.
[3] S. C. Douglas, S.-Y. Kung, and S.-I. Amari, “A Self-
Stabilized Minor Subspace Rule,” Sig. Process. Let¬
ters, vol. 5, no. 12, pp. 328-330, Dec. 1998.
[4] E. Oja, “Principal Components, Minor Components,
and Linear Neural Networks,” Neural Networks, vol.
5, pp. 927-935, Nov./Dec. 1992.
[5] T. Chen, S.-I. Amari, and Q. Lin, “A Unified Algo¬
rithm for Principal and Minor Components Extrac¬
tion,” Neural Networks, vol. 11, pp. 385-390, 1998.
[6] S. Marcos, A. Marsal, and M. Benidir, “The Prop¬
agator Method for Source Bearing Estimation,” Sig.
Proc., vol 42, pp. 121-138, Apr. 1995.
[7] T. Chen, Y. Hua, and W. Yan, “Global Convergence
of Oja’s Subspace Algorithm for Principal Component
Extraction,” IEEE Trans, on Neural Network, pp. 58-
67, Jan. 1998.
[8] A. Chkeif, K. Abed-Meraim, G. Kawas Kaleh, and Y.
Hua, "Blind Adaptive Multiuser Detection With An¬
tenna Array,” accepted for publication in IEEE Trans,
on Comm.
[9] G. H. Golub and C. F. Van loan, "Matrix computa¬
tions,” the Johns Hopkins University Press, 1996.
{10] L. Markus and H. Yamabe, “Global Stability Criteria
for Differential Systems” J. Osaka Math., vol. 12, pp.
305-317, 1960.
[11] R. E. Vinograd, “Remark on the Critical Case of Sta¬
bility of a Singular Point in the Plane” Doklady Akad.
Nauk, vol. 101, pp. 209-212, 1955.
93
A BOOTSTRAP TECHNIQUE FOR RANK ESTIMATION
Per Pelin, Ramon Brcich and Abdelhak Zoubir
Australian Telecommunications Research Institute 1 (ATRI), Curtin University of Technology,
GPO Box U 1987, Perth WA 6845, Australia. E-mail: pelle@atri.curtin.edu.au
ABSTRACT
A crucial step in many signal processing applications is the
determination of the effective rank of a noise corrupted
multi-dimensional signal, i.e., the dimension of the signal
subspace. Standard techniques for rank estimation, such as
the minimum description length, often have shortcomings
in practice, an example being when noise parameters are
unknown. An alternative scheme is proposed for rank
detection. From successive pairs of the ordered eigenvalues
of the array covariance, a series of statistics is formed. The
statistics are chosen such that their distributions for noise
eigenvalue pairs are close. The actual distributions are
unknown and are estimated with the Bootstrap. The rank is
then found by a sequential comparison of the estimated dis¬
tributions using a Kolmogorov-Smirnov test.
1. INTRODUCTION
Many signal processing algorithms, such as direction find¬
ing algorithms, rely on the low-rank structure of a multi- .
dimensional signal. The rank typically has an interpretation
as the model order, revealing the number of signals hidden
in noise, or the dimension of a low-order signal subspace.
Therefore, finding the effective rank of a noise corrupted
signal is a crucial initial step in many applications.
Classical techniques to estimate the rank when the
noise is Gaussian include the minimum description length
(MDL) and Akaike’s information theoretic criterion (AIC)
f 10], and their subjective counterpart the sphericity test [2],
In the latter, a threshold is set to obtain a desired level of
the test, whereas in the objective MDL and AIC, the actual
threshold is dependent on the data size by asymptotic argu¬
ments. Nevertheless, they all rely on the structure of the
noise eigenvalues of the covariance matrix, and it is
required that the actual spatial noise color is known. If the
noise assumptions are violated, for example, when the
noise has an unknown spatial color, detection performance
is degraded. For noise of unknown color, an alternative to
eigenvalue-based tests is to use properties of canonical cor¬
relations [2], as in [11][12], However, these schemes put
some restrictions on the structure of the data model, limit¬
ing their applicability.
1. This work was in part supported by the Australian Telecommunica¬
tions Cooperative Research Centre (AT-CRC).
To mitigate the problem of slight uncertainties in the noise
model, both w.r.t. possible non-Gaussianity and noise
color, a new technique for rank detection is proposed. The
detection procedure is based on a property of the marginal
distributions of the noise sample eigenvalues. Instead of
relying on parametric assumptions, these distributions are
estimated from the data using the Bootstrap [5]. Based on
these estimates, the distributions of a series of secondary
variables are estimated, on which the actual rank estima¬
tion is performed using a robust Kolmogorov-Smirnov test
[7]. The necessary number of Bootstrap resamples is sur¬
prisingly small, keeping the computational cost at a reason¬
able level.
2. MODELING
Consider m-variate data according to the linear model
x(«) = 4s(n) + v(n) (1)
where A is a mixture matrix (for example the array steer¬
ing matrix in sensor array processing), s{n) is a vector of
signals, and v(n) is noise from some possibly unknown
distribution. Assuming the signal and noise are uncorre¬
lated and zero-mean, the array covariance is
Rx = E[x(n)xH(n )] = ARSAH + RV. (2)
The problem considered is to determine the rank of the sig¬
nal part/subspace, i.e., d = rank(ARf) , based on N
observations of the data (1).
If the additive noise is spatially white, Rv = a2 1 , The
(population) eigenvalues of (2) are
> ... > Xd > Xd+ , = ... = X„, = o , (3)
i.e., the true noise eigenvalues are all equal. However, when
calculated from the sample covariance
1 N
RX = jy X x(n)xH(n) , (4)
n - 1
estimated from a finite number of N data snapshots, the
ordered sample eigenvalues are distinct with probability
one, i.e.,
Xi> ... >Xd>Xd+\ > ... >Xm>0. (5)
The distribution Fnx(X) of (5) for a data sample of N
snapshots, either in the form of a probability density func¬
tion (PDF), or a cumulative distribution function (CDF),
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
94
tends to take a very complex form. The sample eigenvalues
are biased (as in (5)) and mutually correlated. The exact
distribution is only known for the Gaussian case with cer¬
tain population eigenvalues, and is given in the form of a
series expansion [8]. For the general case, both w.r.t. the
actual source distribution and the population eigenvalues,
the distribution (joint or marginals) may only be available
asymptotically, for large N [1][8]. For small/moderate N ,
corresponding to many practical applications, the error in
the asymptotics may be substantial. Thus, there is no gen¬
eral ‘ease of use’ form of Fnx(X) available.
Instead of relying on asymptotic results, which are
unreliable on short data records, the detection scheme to be
presented in the next section will be based on an approxi¬
mate relation between the marginal distributions of the
noise sample eigenvalues. Specifically, numerical experi¬
ments indicate that for the white noise case (3), the mar¬
ginal PDFs of the noise sample eigenvalues are
approximately related as
fNi(ii)=fN(K%) i>d+ 1 (6)
for some k , i.e., the marginals /W,(X,), i>d+ 1 are simply
scaled versions of the same basic PDF fN(-) ■ While there is
no claim of the generality of this approximation, it has
shown to very precise when the ratio N/m is say five or
higher. Also, what is important for detection based on this
property, is that the approximation is robust to slightly
colored noise, and practically invariant to non-Gaussianity.
Then, even if the data does not correspond perfectly to the
assumed data model, (6) allows for robust rank detection.
An example to illustrate (6) will be given in Section 4.
3. DETECTION
3.1. Detection principle
To indicate how the relation (6) can be used for rank esti¬
mation, assume a number of m independent variables T) ;
having distributions identical to the marginal distributions
of the sample eigenvalues 1, . From the T|, , m - 1 second¬
ary variables v, are formed as the ratios
v/ = V'1l/+i»,‘e [1,/n-l]. (7)
Then, up to the order of the approximation given in (6), v,-
for i € [d+ \,m- 1] will have identical distributions, as
these v, are invariant to the (possibly unknown) scaling K .
However, vd = r\d/r\d+i , involving the marginal of the
smallest signal eigenvalue, will tend to larger values than
vrf+ , . This forms the basis for rank detection: if the mar¬
ginals can be captured from the data
x(n),ne [1, N] , the order d can be estimated by testing
for equality among the distributions of v„ i e [1, m - 1 ] .
A practical algorithm to exploit this property for rank
estimation is as follows:
1 . Use the Bootstrap to first estimate the marginals of the
sample eigenvalues fNi(h), i = [1, m] , and then the
distributions of vf, i e [1, m - I] .
2. Apply the Kolmogorov-Smimov test [7] to test for
pair-wise equality of the distributions Fv ,(v,) of'v, ,
starting from the bottom (Fv m_2(v„,_2) versus
Fv, m _ i(vm_ i) )> and stepping up until equality is
rejected.
Before going into the full details of the scheme, it is neces¬
sary to establish how the Bootstrap behaves when resam¬
pling data to calculate eigenvalues.
3.2. The Bootstrap and eigenvalues
The Bootstrap is a general tool for estimation of the distri¬
bution of a statistic from a sample of data. In this case the
Bootstrap is employed to estimate FnX(\) . The principle
of the Bootstrap is as follows. The original data
x(n), n = [ 1, N] , i.e.,
XN = [*( I ),..., x(N)], (8)
is an estimate of the distribution of x(n). Assigning each
snapshot a probability l/N , resamples are taken randomly
(with replacement) from XN , giving Bootstrap data
X*N = [x\l ),..., x*(N)]. (9)
From the Bootstrap resample X*N , the sample eigenvalues
are calculated (through (4)), giving
r = [it . i*m] (10)
with X* > ... > £,,* > ... > V„, . The procedure is repeated a
number of B times. Then, the Bootstrap distribution
derived from the B replicates of (10) is a nonparametric
estimate of FNX(%) .
As the sample eigenvalues are highly non-linear func¬
tions of the data sample, results on the Bootstrap w.r.t. lin¬
ear statistics do not apply. Though, some results on the
properties of eigenvalues calculated from resampled data
can be found in [3] [4]:
• For distinct population eigenvalues, Fvx(^) converges
asymptotically to FnX(\) .
• For equal population eigenvalues (such as in the white
noise case), FN\(i) does not converge to FnX(K).
However, if resamples are taken of size M <N from
X^ , such that M — » °° as N — > °° , while M/N — » 0 ,
then Fm\(X) converges weakly to F m {\) , i.e., the
distribution of the eigenvalues of a sample
x(n),n = [1 ,M].
From numerical experiments it is easily seen that the major
problem with the Bootstrap is to characterize the depend¬
ence between sample eigenvalues: while the Bootstrap does
make a good job capturing the marginals, the dependence
between the sample eigenvalues is not maintained in
Fn\(X) for reasonable N . This motivates the use of the
95
marginals only. Also, a full characterization of the joint m -
dimensional distribution would require a very large data
record (N) . By only considering the marginals, a much
smaller data size is required. It is also worth considering
resamples of size M <N . This relaxes the strong depend¬
ence on the actual data XN somewhat, which seems to
remove some erratic behavior seen on small sample sizes.
3.3. Detection scheme
The full estimation/detection procedure is as follows:
1. Estimate the marginal distributions /m,(A,),
i = [1, m] , by taking B resamples of size M from
the data XN ^ For each resample, calculate the sample
eigenvalues A (10).
2. Estimate the distributions of v(, i e [1, m- 1] (7). To
do this, note that in place of the fictitious independent
variables t|(, i e [1, m\ , sample eigenvalues A* from
different resamples A can be used (the sample eigen¬
values from one resample are correlated). Thus, form
V* = )//(£*+ 1)*, / e [1, m- 1] (II)
with / and k being different resamples. Although an
arbitrary number of resamples {Bf) of (11) could be
taken, it is sensible to use all B A* from step 1 in a
systematic way. Estimate the CDFs of v, ,
ie [ 1, m - 1 ] , by the staircase approximation
Fv,i(x) = number of(v* < jc)/J5 . (12)
3. Determine the test statistics for the one-sided Kol-
mogorov-Smirnov (KS) test from the distributions (12)
T, = sup(Fv, / + i(jt) —Fv,i(x))
x U-U
for i = [ 1, m — 2] . Under the hypothesis that
Fv,i+\(x) and Fvj(x) are equal, the test statistic 7) is
asymptotically distributed as [7]
P(jBTt < x) -> 1 - exp(-2x2) (14)
for x > 0 .
4. Final step. Determine the rank d from a sequential test
on the KS statistics:
I Set i = m - 2 .
II Define the null hypothesis H: d = i , and the
alternative hypothesis K: d <i .
III Set a threshold y based on the tail area of the distri¬
bution (14) of (13) under K [7].
IV If Tj > y accept H (i.e. reject equality of distribu¬
tions) and stop, else set i = i- 1 and return to II.
Note that in order to enable a correct decision, the test pro¬
cedure requires there are at least two noise eigenvalues.
There are a number of parameters to be tuned/chosen
in the scheme. First, consider the resample size M . A
smaller M tends to improve the estimate of fMl(X,) . How¬
ever, a small M leads to a loss in the signal to noise ratio
(SNR) detection threshold (i.e., the minimum SNR
required for reliable rank detection), as the relative distance
between fMd(Xd) and fM(d+l)(X(d + i)) decreases with a
decreasing M. For a data size N of order ~0(1O2) , a rea¬
sonable trade-off is M = 3N/4 .
The number of Bootstrap resamples B has an impact
on the estimated distributions and is therefore a crucial
parameter. Some guidelines on the impact of the number of
Bootstraps B can be found in [5] [6]. Unfortunately, no
results are given in absolute terms. However, note that the
proposed detection scheme does not require any critical
values to be estimated with high precision. What is impor¬
tant is that the locations of the distributions of the v, are
estimated with sufficient accuracy for the subsequent KS
test to work properly. Thus, B should be large enough such
that the means of v* are reasonably stable on a normalized
scale. A coarse first order approximation of £[jiv-] gives
k; - k:( > <15>
i.e., the stability of the location depends to a large extent on
the location of A, . To arrive at an expression for the neces¬
sary B , note that the sample eigenvalues are reasonably
close to Gaussian. The separation (bias) of two sample
eigenvalues corresponding to equal population eigenvalues
is roughly two times the standard error, see Figure 1 (note
that this relation holds regardless M). Now, the standard
error of the sample mean of B iid Gaussian variables is
O/jB . Thus, the location error of }'Mi(Xj) , normalized to
the separation of neighbouring distributions, is of order
o/jB _ _1_
20 2 Jb'
(16)
As an example, with B = 25 the location error is of order
0.1 which is small enough for reliable detection. Note that
there is no point using too large a B , as the error originat¬
ing from the approximation (6) then will dominate the ‘ran¬
domness’ in Tj .
The final parameter to be chosen is the threshold y for
the KS test in Step 4. This threshold can be determined in
two ways. First, y can be set to maintain a desired level of
the test at each sequential stage (as in the sphericity test),
based on the distribution (14) of the test statistic (13) under
K (the hypothesis that the distributions are equal). Alterna¬
tively, y can be set for ‘MDL-like consistency’. To see this,
note that 7, — > 1 rapidly under Ft for increasing SNR, or
N . At the same time, under K , the tail probability of 7) is
small even for modest y. Thus, y can be set to provide a
probability of detection very close to one, without much
penalty in the SNR threshold. As an example, with
B = 25, the 95% level under K is y=0.35. With
y = 0.7 , the level is 99.9995%.
96
a)
Figure 1. CDFs of a) sample eigenvalues, b) scaled sample
eigenvalues, c) scaled Bootstrap eigenvalues, and d) the test
variables (k,*) / {k* + , )k .
4. NUMERICAL EXAMPLES
The detection scheme relies on the validity of the assump¬
tion (6). To illustrate the principle of the test, data was gen¬
erated according to the model (1): a 6-element uniform
linear array with half wavelength element spacing receives
d = 2 uncorrelated Gaussian signals from directions
[ 10°, 25°] , relative to the array broadside. The signals
were observed in white Gaussian noise with an element
SNR of -3dB. Figure la shows the marginal CDFs of the 6
sample eigenvalues, when calculated based on N = 100
independent array snapshots. Figure lb shows the CDFs
when the sample eigenvalues have been pre-scaled with
k'~4 (relative to eigenvalue number four) as in (6). In this
case, k = 1.21 , and the scaled noise CDFs are all very
close, with a largest pair-wise separation |/7’Kf - FK(i + x j| of
0.11 for i> d .
Similarly, Figure lc shows the CDFs of scaled
(k=1.36) Bootstrap eigenvalues, estimated from B = 50
resamples of size M = 75 , taken from one data realization
of N = 100 snapshots. Clearly, the Bootstrap eigenvalues
are slightly more variable, which is due both to M <N ,
and the effective loss in sample size from resampling.
Again, the noise CDFs are close, but with some random
fluctuations due to the limited number B . However, note
that even with an infinite number of Bootstraps, there will
still be a remaining error due to the approximation (6) (as
in Figure lb), as well as the limitation of the Bootstrap
itself [3] [4], Finally, the CDFs of the variables (1 1), calcu¬
lated from the B = 50 sets of Bootstrap eigenvalues, are
shown in Figure Id. These are the CDFs on which the KS
test is based. The ‘noise only’ CDFs are close, while the
a) Proposed
Figure 2. The probability of correctly estimating the rank
( d = 2) versus the SNR, for various spatial noise color: a)
Proposed scheme, b) MDL.
CDF of ki/kl is the rightmost; with increasing SNR or
data size this CDF moves further to the right. Clearly, the
KS test can easily decide on the correct rank from the sepa¬
ration of the CDFs. Note that the dashed CDF is due to the
two signal eigenvalues.
In the ideal case, with white Gaussian noise, the per¬
formance of the proposed scheme is virtually identical to
MDL and the sphericity test (depending on how the thresh¬
old is set; for ‘consistency’, or for a fixed level) in terms of
SNR and data size thresholds, and the ability to resolve
closely spaced targets. Instead, the power of the new
method lies in its robustness to unknown noise color. To
illustrate, data was generated according to Figure 1, but
varying the spatial noise color. Specifically, the kith ele¬
ment of the noise covariance matrix in (2) was
( Rv)kl = exp(— a|fc-/|) , with a being the parameter to
be varied. For the detection procedure, B = 25 resamples
of size M = 75 were taken from each original data set of
size N = 100 . The actual B is at a boundary: a smaller B
leads to a penalty in SNR threshold, whereas a larger gives
no further improvement. The threshold y was set to 0.7 for
‘consistent’ detection. The performance of the proposed
scheme as well as MDL as a function of the SNR is shown
in Figure 2a-b, for ae [0, 0.1, 0.2, 0.3] . It is seen that the
proposed scheme maintains good detection performance
for increasing a . Though, there is a penalty in the low
SNR threshold. This is caused by the distributions of the
noise eigenvalues being further separated for an increasing
a , leading to a reduction in the SNR margin. Increasing a
beyond 0.3 causes a more substantial degradation as the
approximation (6) is no longer good. The performance of
MDL suffers at comparatively small a .
97
a) Proposed
6. REFERENCES
Figure 3. The probability of correctly estimating the rank
( d = 2) versus the SNR, for various temporal noise color: a)
Proposed scheme, b) MDL.
Similarly to an unknown spatial noise correlation, an
unknown temporal noise correlation may also lead to a
degradation in detection performance. The above experi¬
ment was repeated with spatially white but temporally
colored noise, having a temporal covariance of
r(t) = exp(-a|x|), with varying a. With temporally
colored noise, the data is no longer iid. For a better result
with the Bootstrap, resampling was performed using block
resampling [9]. The resample size was M = 70 , with each
resample made up of 7 random sections of 10 consecutive
snapshots from the original data (with replacements). The
number of resamples was increased to B = 50 .
The results for various SNR and a are shown in Fig¬
ure 3a-b. As seen, MDL loses performance with an increas¬
ing a . This is easily explained, as a temporal correlation
reduces the effective data size. With the ’true’ data size N
being incorrect, the penalty term in MDL will be errone¬
ous. On the other hand, the proposed technique does not
rely directly on N , making it robust to the temporal noise
correlation.
[1] T. W. Andersson, ‘Asymptotic Theory for Principal Compo¬
nent Analysis’, Ann. Math. Statist., vol. 34, 1963.
[2] T. W. Andersson, ‘An Introduction to Multivariate Statistical
Analysis, 2nd ed.’, Wiley, 1984.
[3] R. Beran, M. S. Shrivastava, ‘Bootstrap Tests and Confidence
Regions for Functions of a Covariance Matrix’, Annals of
Statistics, vol. 13, no. 1, 1985.
[4] R. Beran, M. S. Shrivastava, ‘Correction-Bootstrap Tests and
Confidence Regions for Functions of a Covariance Matrix’,
Annals of Statistics, vol. 15, no. 1, 1987.
[5] B. Efron, B. Tibshirani, ‘An Introduction to the Bootstrap’,
Chapman and Hall, 1993.
[6] P. Hall, ‘On the Number of Bootstrap Simulations Required
to Construct a Confidence Interval’, Annals of Statistics, vol.
14, no. 4, 1986.
[7] E, B. Manoukian, ‘Modem Concepts and Theorems of Math¬
ematical Statistics’, Springer, 1986.
[8] R. J. Muirhead, ‘Latent Roots and Matrix Variates: A Review
of Some Asymptotic Results’, Annals of Statistics, vol. 6, no.
1,1978.
[9] D. N. Politis, ‘Computer-Intensive Methods in Statistical
Analysis’, IEEE Signal Processing Magazine, Jan. 1998.
[10] B. Porat, ‘Digital Processing of Random Signals’, Prentice
Hall, 1994.
[11] P. Stoica, M. Cedervall, ‘Detection Tests for Array Process¬
ing in Unknown Correlated Noise Fields’, IEEE Trans. Sig¬
nal Processing, vol. 45, no. 9, Sept. 1997.
[12] Q. Wu, K. M. Wong, ‘Determination of the Number of Sig¬
nals in Unknown Noise Environments-PARADE’, IEEE
Trans. Signal Processing, vol. 43, no. 1, Jan. 1995.
5. CONCLUSIONS
A new technique for rank estimation has been presented.
While giving similar performance as classical well-known
techniques under ideal conditions, the new method, based
on the Bootstrap, is robust to errors in the noise model. The
price for robustness is an increase in the computational
complexity. However, as the number of Bootstrap replica¬
tions is fairly small, this increase is modest.
98
DETECTION-ESTIMATION OF
MORE UNCORRELATED SOURCES THAN SENSORS
IN NONINTEGER SPARSE LINEAR ANTENNA ARRAYS
Yuri I. Abramovich, Nicholas K. Spencer
Cooperative Research Centre for Sensor Signal and Information Processing (CSSIP),
SPRI Building, Technology Park Adelaide, Mawson Lakes, South Australia, 5095, Australia
yuriOcssip . edu . au nspencerQcssip . edu . au
ABSTRACT
We introduce a new approach for the detection-estim¬
ation problem for sparse linear antenna arrays com¬
prising M identical sensors whose positions may be
noninteger values (expressed in half- wavelength units).
This approach considers the (noninteger) Ma -element
co-array as the most appropriate virtual array to be
used in connection with the augmented covariance ma¬
trix. Since the covariance matrix derived from such
virtual arrays are usually very underspecified, we dis¬
cuss a maximum-likelihood (ML) completion philoso¬
phy to fill in the missing elements of the partially spec¬
ified Hermitian covariance matrix. Next, a transforma¬
tion of the resulting unstructured ML matrix results
in a sequence of properly structured positive-definite
Hermitian matrices, each with their ( Ma — p) small¬
est eigenvalues being equal, appropriate for the candi¬
date number of sources p. For each candidate model
(p = 1, . . . , Ma - 1), we then find the set of directions-
of-arrival (DOA’s) and powers that yield the minimum
fitting error for the specified covariance lags in the
neighbourhood of the MUSIC-initialised DOA’s. Fi¬
nally, these models describe a hypothesis with respect
to the actual number of sources, and allow us to se¬
lect the “best” hypothesis using traditional informa¬
tion criteria (AIC, MDL, MAP, etc.) that are based on
likelihood ratio.
1. INTRODUCTION
In our previous papers [5, 3, 2, 4], we introduced a
new technique for detection-estimation of more uncor¬
related Gaussian sources m than sensors M (m > M)
for the class of integer-spaced arrays. Here, we present
one attempt to extend this approach to the class of
noninteger-spaced nonuniform linear arrays (NLA’s).
Since such arrays generate up to | M(M - 1) distinct
nonzero covariance lags, they have the potential [8] to
estimate a superior number of uncorrelated Gaussian
sources, ie. for the number of sources in the range
M <m< (1)
For a known number of sources m, we previously in¬
troduced [6] a DOA estimation technique capable of
handling these superior scenarios. The current prob¬
lem of detection-estimation is more complicated since
we now require both an estimation of the number of
sources and their DOA’s.
Naturally, this problem has a solution if and only
if the identifiability conditions hold, which in this case
means that the observed set of covariance lags gen¬
erated by the NLA can be uniquely decomposed into
some number of signal dyads plus white noise. While
the nonidentifiability conditions for detection are given
in [4], here we concentrate on identifiable scenarios
only; that is, for the true (deterministic) covariance
lags and the chosen virtual array, the partially specified
covariance matrix has a unique completion that corre¬
sponds to a mixture of m uncorrelated plane waves in
white noise.
In practice, when the observed specified covariance
lags are stochastic, being produced by a sample M-
variate covariance matrix, the feasibility conditions for
our type of positive-definite (p.d.) completion are not
guaranteed. Therefore, in order to achieve a p.d. com¬
pletion with equalised ( Ma - m) minimum eigenval¬
ues, even the specified (measured) covariance lags need
to be modified. Clearly, by not limiting the size of
the modification of the specified lags, we can achieve a
p.d. completion with the desired number ( Ma — p) of
minimum eigenvalues being equal.
Note that for a Hermitian matrix to represent a
mixture of p uncorrelated plane waves in noise, the
equality of the ( Ma — p) smallest eigenvalues is only
a necessary condition (whereas this is the necessary
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
99
and sufficient condition for a Toeplitz matrix). Thus
some further modification of the specified covariance
lags is required in order to correctly model the sources,
along with an appropriate completion of the missing
(unspecified) covariance lags.
In this way, we finally obtain a number of candidate
models, ie. Mtt- variate p.d. Hermitian matrices of the
proper structure, that are now compared with the ML
completion discussed below using traditional informa¬
tion criteria that judge a loss in likelihood ratio against
an overestimated number of sources.
2. PROBLEM FORMULATION
Consider m narrow-band plane-wave signals of power
p = \pi, ... ,pm] impinging upon a nonuniform linear
array of M identical omnidirectional sensors located
at positions d = [d± = 0, di, . . . , d\f] measured in half-
wavelength units. In the detection-estimation problem,
the number of sources m is unknown. Adopting the
commonly used data model [12], we have
y(t) = S{0) x{t) + p{t) for t = l,...,N (2)
where
x(t) = [®i(i), . ..,arm(f)] T (3).
y(t) = ...,yM(t)]T (4)
v{t) = ...,VM(t)]T , (5)
Xj(t) ( j = 1, ...,m) is the complex signal amplitude
of the jth plane wave, and where j/*(t) and rjk(t) ( k =
1, . . . , M) are the sensor output and the noise at the
kth sensor respectively. To permit DO A estimation in
the superior case (m > M), we restrict ourselves to
the class of independent (Gaussian) signal amplitudes
x(t) € Cmxl such that
«{««.-(«} -{£-dh*w £
(6)
We assume that the additive noise rj(t) € CMx 1 is white
and Gaussian:
£{¥M-A<a)} = {S”7" £ m
The array manifold matrix is S(9) = [s(#i), . . . , a(0m)]
G CMxm, where each constituent “steering vector” s(0j)
is defined as
s(0j) = £l, exp (iirdiWj) , . . . , exp (indMWj) j (8)
with to = sin 0 e [—1, 1],
According to this model, the M-variate spatial co-
variance matrix
R = SPSH +p0IM (9)
is p.d. Hermitian. Note that in our (“superior”) case
of m > M, the noise-free covariance matrix SPSH is
generally of full rank. Given N independent samples
(“snapshots”), the sufficient statistic for DO A estima¬
tion is the M-variate direct data covariance (DDC) ma¬
trix
*"(*)• (10)
v t- 1
To illustrate our technique, consider the “quasi¬
integer” [6] four-element NLA
d4 = [0, 1.09, 3.96, 5.93] (11)
that may be easily recognised as a slightly perturbed
version of the optimal four-element integer array [10]
d = [0, 1, 4, 6]. In [6], we demonstrated that up to six
independent sources could be unambiguously identified
by the NLA d4. The co-array of d4 (the sorted set of
nonduplicated position differences) is
c4 = [0, 1.09, 1.97, 2.87, 3.96, 4.84, 5.93] (12)
and so the augmented Ma = 7-variate Hermitian co-
variance matrix for the virtual array c4 is extremely
underspecified:
r0
n.09
>"1.97
^2.87
r3.96
r4.84
r6.93
r1.09
r0
?
?
?
?
?
r*.97
?
ro
?
?
?
?
r2.87
?
?
ro
?
?
?
r3.96
?
?
?
r0
?
?
r4.84
?
?
?
?
r0
?
r5.93
?
?
?
?
?
r0
(13)
Nevertheless, it is important to understand that for the
true covariance lags ro, ... , rB.93, identifiability means
that there exists a single p.d. completion of H with
equalised ( Ma — p) minimum eigenvalues for any sce¬
nario with m < 6 independent sources.
Let S be the set of specified elements {p,q}, and S
be the set of unspecified elements in the initial incom¬
plete augmented covariance matrix H. Suppose for the
moment that given the specified sample covariance lags
rs, we somehow generate a set of candidate p.d. Ma-
variate Hermitian matrices (p = 1, ...,Ma — 1)
that each correspond to the model of p plane waves
in noise. To select the best candidate model, we cal¬
culate the likelihood ratio (LR) for each corresponding
100
M-variate Hermitian matrix
R„ = LHI1LT
where L is the M x Ma binary selection (or incidence)
matrix with Ljk equal to unity in the jth row and djj.h
column, and zero otherwise.
If we use the sphericity test [11]
Ho : 5|=PofAr
Hi: £{rp_5 RRp 5 | 7^ PqIm ,
against
Po > 0 (15)
then the LR is
7 (H„) =
det(R^R)
-hTriR^R)
Now information or Bayesian criteria may be used for
model selection, such as the minimum description length
[7]
3
"W = arg min , -log7(TM) + -zfiiogN .
(17)
Obviously this approach is optimal only if Hp is the ML
estimate of the p.d. Hermitian matrix with equal (Ma —
fi) minimum eigenvalues. Since exact ML estimates
of this kind are not yet available, our problem is to
generate a set of above-described Hermitian matrices
Hp sufficiently close to the sufficient statistic R in the
ML sense.
3. “MAXIMUM-LIKELIHOOD”
POSITIVE-DEFINITE HERMITIAN
COMPLETION
In [6], we introduced several p.d. Hermitian comple¬
tions, including maximum-entropy (ME) completion.
These completions are used here as an initialisation
step for the following optimisation routine. Let the
general virtual array d! be specified by the virtual sen¬
sor positions d' (j = 1, . . . , Ma), then the set of all
possible p.d. Hermitian completions H may be written
a sH =
{z:H(z)=H0+J2 {'ReHpgE™+ilmHpqEpq)>0}
P,?€5
P<9
(18)
where
x=\ 1 (19)
2571 ^ WsIP<9
E+ = ePe J + eqep , E ™ = epe % - eqep (20)
ep = [0, . . . , 0, 1, 0, . . . , 0] is the M„-variate basis vector
with a unit entry in the pth position, and Ho is the
initial completion (eg. ME completion).
Suppose we label each of the missing lags (pq €
S; p < q) from 1 to £, the total number of missing
lags. For nonredundant NLA geometries such as d*,
the number of missing lags is rather large:
i=\(v-l)(v-2), (21)
where u = | M(M — 1) + 1. Now, instead of (18), we
may write
2 e
H = {z:H(z)=H0 + '£/zjFj>o} (22)
j=i
where
7^ rpq£S; p<q
for j = !,...,£
J l
^pqeS; p<q
for j = 1+1,..
( E™ for
\ iEpJ for
j = 1,...,£
j = l + l,...,2i
(23)
. (24)
For sufficiently small z* in (22)
\zk\ < e for k = 1, (25)
we may treat the term Y^k- i zk^k as being equal to
a perturbation matrix SH(z0), and find a first-order
expansion for the eigenvalues [9]. According to the
sphericity LR (16), the problem of ML maximisation
is associated with the problem of eigenvalue equalisa¬
tion in the matrix
G(z) = R~*[LH(z)Lh]R-s . (26)
By applying a first-order expansion to the eigenvalues
of G(z):
2 1
G(z) = G0 + Y,ZkR~h LFkLH R~* (27)
k= i
we cam derive that
A,[G(z)] = A,[(7o]+X; zk ufHR~* LFk LH R~ * u<°>
where (g = 1, . . . , M) is the gth eigenvector of the
matrix Go, with corresponding eigenvalue Aff[Go]. Now
we can introduce the (M x 2£) matrix
= (ufffrUFtLflr^ff=1 . M (29)
v ) 1 , 2 1
101
and our search to find sufficiently small perturbations
(\zk\ < e) that minimise the difference between the
( Ma — (i) smallest eigenvalues of G(z) may then be
formulated as the following linear programming (LP)
problem:
Find min (a — (3) subject to (30)
A(0) + X>(0) z < a 1 , a > 0 (31)
A(0) + X>(0) z>P 1 , 0 > 0 (32)
— e < Zk < £ for k — 1, 21 (33)
where A^ is the vector of noise-subspace eigenvalues:
A(0) = [A£_M+1,...,AgjT. (34)
and 1 = [1, . . . , 1]T. Let the solution of this LP prob¬
lem be z*-0), then we define an updated Hermitian ma¬
trix
jfW = H <°> + Y, 40) Fk (35)
k= 1
and so by direct decomposition
= R-s LHW Lh R-s (36)
we may check the validity of the constraints (33), and
decrease the perturbation “step size” e if our equal¬
isation step failed to improve the current differences
amongst the noise-subspace eigenvalues of the matrix
If the validity conditions are met, then we com¬
pute the associated it^ and A^ and then solve the
iterated LP problem. Suppose that k iterations are re¬
quired before this procedure essentially reaches its final
stable point.
Naturally, the global optimality of the overall proce¬
dure cannot be guaranteed, whereas at each step ( ie . lo¬
cally), the LP routine provides the optimal solution.
Note that during this first stage of our routine, only
the unspecified (missing) elements of have been
varied, while the specified sample covariance lags re¬
main the same as for the initial point Hq.
Now, during the second stage of the ML maximi¬
sation routine, we modify all covariance lags. Since
small perturbations in the sample covariance lags of
R (with respect to the exact values in R) lead to sig¬
nificant fluctuations in the noise-subspace eigenvalues
<rn of the matrix H, “inverse perturbations” in H that
equalise up to the ( Ma—m ) smallest eigenvalues should
not involve significant changes to the sample covariance
lags. Effectively, we use the same optimisation routine
(30) here with the only significant difference that now
all elements (except the diagonals) are varied, ie.
M(M-l)
2?(«+1) _ h{k) + Y' 40) pk • (37)
fc=i
Given that we cannot guarantee the global optimality
of this second optimisation routine also, we may treat
the solution ( H W, say) as the unstructured ML esti¬
mate of the Ma-variate covariance matrix . There¬
fore the probability of obtaining the desired number of
identical minimum eigenvalues in H ^ is zero.
For this reason, our third stage involves obtaining
a properly structured ML estimate that corresponds to
a mixture of (J. independent plane waves in noise. The
unstructured ML estimate H ^ is used as a sufficient
statistic, and further modification of the unspecified
entries occurs in order to equalise the (Ma—fi) smallest
eigenvalues in this matrix. Obviously, we expect the
more eigenvalues that are to be equalised, the more
losses we will obtain in the LR compared with the ML
estimate HAfL .
Similarly to the above, we may present this equali¬
sation routine as
2 1
HU+i) = HU) + YzkFk, ff J°> = (38)
*=i
where is the p.d. Hermitian matrix obtained at the
jth iteration of the equalisation routine. As before, by
applying a first-order perturbation expansion for the
eigenvalues of the matrix H^+1\ we can derive the
following LP problem:
Find min (a - 0)
subject to
(39)
aU) _|_ yC?) z < a 1 ,
a > 0
(40)
a-U) + y(j) z > p l,
/3>0
(41)
—e< zk < e for k
CM
i-H
II
(42)
where
1 if
•li
£
II
•**>
>
-M+1 . Afo
.,21
(43)
is the ith eigenvector of the matrix h}P , with as¬
sociated eigenvalue , and <r^ is the vector of noise-
subspace eigenvalues. Step size control of e is imple¬
mented in the same fashion as before (33).
Clearly, the stable point of this third stage ( H ,
say) would not result in exactly equal noise-subspace
eigenvalues, since (as in the first stage) the specified
entries have not been modified. Of course, it is pos¬
sible to use a transformation to reach this final goal.
Such a transformation keeps the eigenvectors of
invariant, and so the MUSIC-derived DOA estimates
for \i sources also remain the same. However, due to
the dimension reduction brought about by (14), the
102
REFERENCES
LR (16) would change as a result of such a transfor¬
mation. Moreover, even with strictly equalised eigen¬
values, the Hermitian matrix H 'jfi does not necessar¬
ily corresponding to the desired plane-waves-plus-noise
model.
Thus our fourth and final stage, that considers the
sequence of “ML” hypotheses H (p = 1, . . . , Ma - 1),
consists of a local ML refinement of the p DOA esti¬
mates and associated signal powers in the vicinity of
the MUSIC DOA estimates generated by the covari¬
ance matrix h\?\ This local refinement procedure
is introduced in [1], and involves the specified covari¬
ance lags only. As a result, for each candidate model
\i — 1, 1, we can find the “ML” set of es¬
timated signal parameters { 9 ^ ,pjf} and estimated
white noise power
K = <44)
3= 1
that uniquely describes the covariance matrix R M in the
hypothesis (15)
R,=P^lM + j2p^S(9^)SH(e^). (45)
3=1
Obviously, which ever information theoretic or Bayesian
criterion is used for hypothesis selection, such a selec¬
tion uniquely specifies not only the number of sources,
but also the DOA and power estimates.
4. FINAL COMMENTS
Simulation results (not introduced here) conducted for
the NLA di for a superior number of sources demon¬
strates that the detection performance achieved by the
four-stage algorithm described in this paper is com¬
parable to that produced by the standard AIC and
MDL criteria for conventional scenarios (with m < M
sources) with the same Cramer-Rao bound. Natu¬
rally, in order to compare detection performance on
conventional and superior scenarios, it is necessary to
introduce significantly different intersource separation
and/or sample sizes, however the comparable detec¬
tion performance in the two cases suggests that the
new detection scheme described here is close to opti¬
mum. An additional justification for this conclusion
is that when our detection-estimation algorithm yields
the true number of superior sources, we obtain a DOA
estimation accuracy close to the corresponding Cramer-
Rao bound.
[1] Y.I. Abramovich, D.A. Gray, A.Y. Gorokhov, and
N.K. Spencer. Positive-definite Toeplitz comple¬
tion in DOA estimation for nonuniform linear an¬
tenna arrays — Part I: Fully augmentable arrays.
IEEE Trans. Sig. Proc ., 46 (9):2458-2471, 1998.
[2] Y.I. Abramovich and N.K. Spencer. Detection-
estimation of more uncorrelated Gaussian sources
than sensors using partially augmentable sparse
antenna arrays. In Proc. EUSIPCO-2000, Tam¬
pere, Finland. To appear September 2000.
[3] Y.I. Abramovich, N.K. Spencer, and A.Y.
Gorokhov. Detection of more uncorrelated Gaus¬
sian sources than sensors in nonuniform linear an¬
tenna arrays — Part I: Fully augmentable arrays.
IEEE Trans. Sig. Proc. Submitted Feb 2000.
[4] Y.I. Abramovich, N.K. Spencer, and A.Y.
Gorokhov. Detection of more uncorrelated Gaus¬
sian sources than sensors in nonuniform linear an¬
tenna arrays — Part II: Partially augmentable ar¬
rays. IEEE Trans. Sig. Proc. In preparation.
[5] Y.I. Abramovich, N.K. Spencer, and A.Y.
Gorokhov. Detection of more uncorrelated Gaus¬
sian sources than sensors using fully augmentable
sparse antenna arrays. In Proc. SAM-2000, Cam¬
bridge, MA, 2000.
[6] Y.I. Abramovich, N.K. Spencer, and A.Y.
Gorokhov. DOA estimation for noninteger lin¬
ear antenna arrays with more uncorrelated sources
than sensors. IEEE Trans. Sig. Proc., 48 (4):943-
955, 2000.
[7] P.M. Djuric. A model selection rule for sinusoids
in white Gaussian noise. IEEE Trans. Sig. Proc.,
44 (7):1744-1757, 1996.
[8] J.-J. Fuchs. Extension of the Pisarenko method to
sparse linear arrays. In Proc. ICASSP-95, pages
2100-2103, Detroit, 1995.
[9] R.A. Horn and C.R. Johnson. Matrix Analysis.
Cambridge University Press, England, 1990.
[10] A.T. Moffet. Minimum-redundancy linear arrays.
IEEE Trans. Ant. Prop., 16 (2):172-175, 1968.
[11] R. J. Muirhead. Aspects of Multivariate Statistical
Theory. Wiley, New York, 1982.
[12] P. Stoica and A. Nehorai. Performance study of
conditional and unconditional direction-of-arrival
estimation. IEEE Trans. Acoust. Sp. Sig. Proc.,
38 (10):1783-1795, 1990.
103
A NEW GERSCHGORIN RADII BASED METHOD FOR
SOURCE NUMBER DETECTION
Hsien-Tsai Mt and Chan-Li Chen
Department of Electronic Engineering,
Southern Taiwan University of Technology
No.l Nan-Tai street, Yung Kung City, Tainan County, Taiwan
ABSTRACT
In this paper, we introduce the effective uses of Gerschgorin radii
[1-2] of the unitary transformed covariance matrix for source
number detection. The heuristic approach applying a new
Gerschgorin radii set developed from the projection concept,
overcomes the problem in cases of small data samples and an
unknown noise model. The proposed method is based on the
sample correlation coefficient to normalize the signal
Gerschgorin radii for source number detection. The performance
of the proposed method shows improved detection capabilities
over GDE [1,2] in Gaussian white noise process.
1. INTRODUCTION
Array processing, or more accurately, sensor array processing, is
the processing of the output signals of an array of sensors located
at different points in space in a wavefield. The purpose of array
processing is to extract useful information from the received
signals such as the number and location of the signal sources, the
propagation velocity of waves, as well as the spectral properties
of the signals. Array processing techniques have been employed
in various areas in which very different wave phenomena occur.
Common to all these applications, there are, in general, two
essential purposes in array processing: (i)To determine the
number of sources (decision), (ii)To estimate the locations of
these sources (estimation).
Several high resolution detectors[3-5] for direction of arrival
(DOA) have been developed in the field of passive underwater
and radar signal processing in recent years. The primary
contributions to the field include the MUSIC method proposed
by Schmidt [3], the Minimum-Norm method by Kumaresan and
Tufts [4], and the ESPRIT method by Roy et al. [5]. It is well
known that the performances of these high resolution methods
largely depend on the successful determination of the number of
sources. Thus, several methods [6-11] have been suggested with
this purpose in mind. Wax and Kailath [6] bring a statistical
approach to solve the problem of source number detection based
on the AIC and the MDL methods, which are generally used for
the model selection.
In general, the AIC and MDL, including their modified versions,
remain the most widely-used methods for estimating the source
number. Most of them use the eigenvalues to estimate source
number but neglect to use the eigenvectors as well.
Consequently, Wu and Yang [1] proposed a heuristic approach
by applying the Gerschgorin theorem to find Gerschgorin radii of
the transformed covariance matrix for source number detection.
The heuristic detection criterion is developed from the concept of
eigenvectors' projection.
In this paper, a proper similar transformation of the covariance
matrix is required in order to effectively utilize the sample
correlation coefficient to normalize the signal Gerschgorin radii
for source number detection.
2. GERSCHGORIN DISK METHOD FOR
SOURCE NUMBER DETECTION
2.1 NarrowBandModel
We first review the narrow band mathematical model for
estimating the number of sources and DOA of signals in a
spatially white noise environment. The model we consider here
consists of L-dimensional complex data vector x(k) which
represents the data received by an array of L sensors at the kth
snapshot. The data vector is composed of plane-wave incident
narrowband signals each of angular frequency io0 from M distinct
sources embedded in Gaussian noise. Thus, the measured array
data vector, x(k), which is assumed to be composed of M
incoherent directional sources corrupted by additive white noise,
is received at the kth snapshot by L (L > M) sensors and is given
by:
M
x(k) = X si(k)S((0i) + B(k) = A(eo)s(k) + n(k), (1)
i=l
where A(u>)=[ afcOj) a(o>2) ... a(wL) ] is the direction matrix
composed of direction vectors (steering vector) of the signals and
the noise vector n(k), which is assumed to be complex, zero-mean,
T
and Gaussian. The source vector is s(k)=[sI(k), s2(k) . sM(k)] ,
where sm(k) is the amplitude of the mth source and is assumed to
be jointly circular Gaussian and independent of n(k). The exact
form of the steering vector depends on the array configuration.
However, the uniform linear array, apart from being most
commonly used, may also offer advantageous implementation
efficiency of some algorithms. For a propagation wavelength T|,
the distance between two sensors in a uniform linear array must be
D— 13/2 and the corresponding steering matrix is given by
:a(u>m) = [ 1, exp(j©m ), ..., exp(j(L-l)com) ]T, (2)
where com is given by : tom= 2nD s i n©m/ri, where D is the
spacing between adjacent elements. 8m is the impinging angle of
the m^1 source relative to the array broadside where 0m £
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
104
K Jt
(~ ”, ~) for all m. The vectors a(com), m=l,2 M
corresponding to M different values of 9m are assumed to be
linearly independent. This implies that L>M, and rank(A)=M.
Note that it follows that x(k) is a complex Gaussian vector with
zero mean and covariance matrix given by
C = E[ x(k) x(k)H ] = A(«>) C AH(to) + 0*1, (3)
where C , which is the covariance matrix of s(k), is assumed to be
non-singular, and on2 is the variance of Gaussian noise.
Superscripts *, T, and H denote conjugate, transpose, and
Hermitian transpose of matrices, respectively.
If N observations have been measured from L sensors, the entire
data set can be placed in a L*N matrix x as:
=[*(l),x(2),... i(k) ••• A(N)]l>n . (4)
D|- diag(X. | X2 )• U)
The eigenvalues X, > X2 > > XM > XM+| — - ^L-iare
shown in descending order.. Since X t in Eq.(4) are the
eigenvalues of the leading principal submatrix of C, their
eigenvalues satisfy the interlacing property shown as :X, > X , >
^2 — - ^M+l - ^M+l ^L-l - ^L-l - ^L' ^Tie
transformed covariance matrix becomes :
~ X\ 0 0 0 0 0 S\
0 Xi 0 0 0 0 Si
S=UHCU= 0 0 0 \M 0 0 0 , (8)
ooooo
_S\ Si ■ Su ■ Sl-\ cll.
where
Si = a’iH£ . (9)
for i=l, 2, L-l.
Each row of x represents a multivariate observation. For the L-
dimensional scatterplot, the row of x represent N points in L-
dimensional space. Subsequently, the array sampled covariance
matrix in Eq.(3) can also be expressed as :
C = — XX" (5)
- N -
2.2 Gerschgorin Disk Estimator
To make the Gerschgorin disk theorem effective, Wu et al. [2]
proposed a proper transformation, called Gerschgorin Disk
Estimator (GDE) for source number detection. The covariance
matrix is first partitioned as :
where C, is an (L-l)x(L-l) leading principal submatrix of C,
which is obtained by deleting the last row and column of C.
Physically, it can be regarded as the removal of the L,h sensor.
Thus, C, becomes the reduced covariance matrix of the remaining
(L-l) sensors. The reduced covariance matrix C, also can be
decomposed by its eigenstructure as : C|=U|D]U]H, where is
an (L-l)x(L-l) unitary matrix formed by the eigenvectors of C( as :
II I I
Hi = [fl, fl2 3m ■“ 3l ,],and D, is the diagonal matrix
constructed from the corresponding eigenvalues as :
It is clear that the first (L-l) Gerschgorin disks ( i.e. Oj, Oz, ...,
On) possess the Gerschgorin radii :
rj = |Pil = lfliHc|> 0°)
for i = 1, 2, ..., L-l. It is necessary to verify that all of the p,
values are equal to zero when i=(M+l), (M+2),..., (L-l) due to the
fact that the noise eigenvectors, s'j, are orthogonal to A,, which is
the direction matrix of C,.
Since S is a unitary transformation matrix of C, they will share the
same eigenvalues. The collection of the first (L-l) Gerschgorin
disks, Oj, contains its Gerschgorin center at cj = Xj and the
corresponding Gerschgorin radius rt = |pj|, i = 1,2, ...,(L-1). The
disks with zero radii ( i.e. Om+], Om+2,'", Ol_, ) are regarded as
the collection of noise Gerschgorin disks. The remaining disks
( i.e. Op 02, — , Om) containing non-zero radii and large centers
are considered to be the source Gerschgorin disks. Hence, we can
determine the number of sources by counting the number of non¬
zero Gerschgorin radii in the case of infinite samples. In addition,
we can also use (L-l) eigenvalues of Cj to determine the number
of sources.
It can be seen that the threshold must be adjustable to varying
numbers of snapshots. Hence, we define a heuristic decision rule
as[2]: GDE(k) = rk-^? £ <). (>D
i=l
Where k is an integer in the closed interval [1, L-2]. The
adjustable factor, D(N), could be a non-increasing function
(between I and 0) when N increases. If GDE(k) is evaluated
from k=l, the number of sources is determined as k-1 (i.e. M=k-1)
when the first nonpositive value of GDE(k) is reached. This is
due to the fact that the radius value below the adjustable
threshold will be considered the noise collection. Thus, the
above GDE rule may produce problems of underestimation
3. A NEW GERSCHGORIN RADII BASE
METHOD
Hence, the method capable of reducing the radii size of signal
Gerschgorin disks should help resolve source number detection
problem.
3.1 Correlation Coefficients of Samples Space
In light of these requirements, an effective source number
detection method must select a proper transformation for
maximum reduction of the radii size of signal Gerkschgorin disks
and make noise Gerschgorin disks as remote as possible from
signal Gerschgorin disks. Therefore, a nonsingular matrix, D =
diag(X | X2 — X M — XL i 1) was used in [2] to get small signal
Gerschgorin radii, such as
i=l,2,..,M. That method
led to development a novel technique, which outperformed GDE
in Gaussian white and nonwhite noise processes and could be
used successfully even when SNR is near 0 dB. In this paper, we
extend the function of reducing signal Gerschgorin disks by
using a new developed similar transformation of the sampled
covariance matrix and its new set of normalized radii of signal
Gerschgorin disks.
As Eq.(4), If N observations have been measured from L sensors,
the entire data set can be also placed in a L*N matrix x as:
x =[x(l), x(2),... 5(L-1),x(L)]l-n . According to the definition of
the multiple linear regression [12], the maximum correlation
coefficient is define as
(12)
for /'= 1, 2, ... , L andf = 1, 2, ... , L. Note/7/:<i = pkj for all /'and
k. The value of pik must be between 0 and +1 .
Without altering the true eigenvalues, a proper transformation of
the covariance matrix is required in order to effectively utilize the
sample correlation coefficient to normalize the signal Gerschgorin
radii for source number detection.
3.2 The Proposed Method
In this section, a new transformation kernel based on the concept
of sample correlation coefficient is proposed in order to improve
detection performance. Now, a novel transforming matrix is
proposed:
fi=diag(VcTr tMi . .
=diag(q>i TV •HV - HJ/-i,l), (13)
to the transformed matrix in Eq.(14), where are the
eigenvalues of the first (L-l)x(L-l) leading principal submatrix
of C.
The new transformed true covariance matrix becomes:
S'=D-1UhCUD=D-|I
UlHC)Ui HiHc
£% CLL
\
0
r
(14)
According to the Gerschgorin disk theorem, it is clear that the
first (L-l) Gerschgorin disks ( i.e. Oj, 02, ..., Ol_i) contain the
new Gerschgorin radii :
for i = 1,2, ..., M. Since r j in Eq.(15) can be considered as the
correlation coefficient of the covariance matrix in Eq.( 1 2). The
values of r’j are all less than 1, so that Y/ > rj . In other words,
the disk size of signal Gerschgorin disks can be reduced as small
as possible and the noise Gerschgorin disks can be kept as remote
from the signal Gerschgorin disks as possible. Therefore, the
source number can be easily determined by visually counting the
number of signal Gerschgorin disks derived by Eq.( 1 4).
Moreover, when the noise statistics can not be accurately
estimated, the GDE method fails under a low SNR situation;
whereas the proposed method may not.
For example, in the case of one simulated covariance matrix, the
sensor number is 6 (i.e. L=6) and two sources (i.e. M=2) are
uncorrelated and impinged from -12° and 10° (i.e. DOA=[-12°
10° ]). The signal-to-noise ratios are both 2 dB (i.e. SNR=[10
10] dB) and the number of samples chosen is N=100. Its
Gerschgorin disks in terms of Gerschgorin center-and-radius
pairs become {12.11, 0.42}, {7.93, 4.7]}, {0.19, 0.18},
{0.09,0.36}, and {0.08,0.03}. The results are illustrated in Figure
1(a). Subsequently, the same covariance matrix is transformed by
the suggested unitary transformation as shown in Eq.(14). The
results are illustrated in Figure 1(b). It is now significant that the
Gerschgorin disks form two separate collections. The source
collection contains disks Oj and 02 with small radii (less than 1)
and the noise collection O3 n O4 Pi O5 with small radii.
Fig.l(a)(b).Gerschgorin disks of the estimated covariance
matrix
106
4. SIMULATION RESULTS
6. REFERENCES
A uniformly linear array of 8 isotropic sensors is spaced a half
wavelength apart with additive and uncorrelated white noise. The
VGD and GDE methods are used to detect two uncorrelated
sources with SNR’s of 6dB impinging from 0 0 and 5°respectively.
After 200 Monte Carlo runs, we compute their relative frequency
of false detection using various numbers of snapshots, Error
detection performance in terms of probabilities is depicted in
Figure 2. It can be seen that the proposed method outperforms
GDE.
Fig.2 Detection performance of the AIC, MDL, GDE, and the
proposed method in uses of simulated data with Gaussian white
noise.(SNR=[6 6]dB, DOA=fO° 5°])
5. CONCLUSION
In this paper, GDE performance is improved by using a
developed similar transformation of the covariance matrix
and using its new set of Gerschgorin radii to design the
source number estimators. The proposed method is based
on the sample correlation coefficient to normalize the
signal Gerschgorin radii for source number detection. The
performance of the proposed method shows detection
capabilities superior to GDE in Gaussian white noise
process and can be used successfully in a situation of
measured experimental data.
ACKNOWLEDGMENT
This research was supported by the National Science
Council under Grant #NSC88-2612-E-218-001, Taiwan,
Republic of China.
[1] H. T. Wu, J. F. Yang, and F. K. Chen, "Source number
estimators using Transformed Gerschgorin Radii,” IEEE
Trans. SP, vol.43, pp. 1325-1333, Jun. 1995
[2] H. T. Wu, and J. F. Yang, "Gerschgorin radii based source
number detection for closely spaced signals,” Proc. IEEE
Int. Conf. Acoust., Speech, Signal Processing, Atlanta,
pp.3054-3057. May, 1996.
[3] R. O. Schmidt, "Multiple emitter location and signal
parameter estimation," in Proc. RADC Spectrum Estimation
Workshop, pp.243-258, Ocb. 1979.
[4] R. Kumaresan and D. W. Tufts, "Estimating the angles of
arrivals of multiple plane waves," IEEE Trans. Aeospace
and Electronic Systems, vol. AES- 19, pp. 134- 139, 1983.
[5] R. Roy and T. Kailath, "ESPRIT-estimation of signal
parameters via rotational invariance techniques," IEEE Trans.
ASSP. vol.ASSP-37, pp.984-995, July 1989.
[6] M. Wax and T. Kailath, "Detection of signals by information
theoretic criteria," IEEE Trans. ASSP, vol. 33, no.2, pp.387-
392, April 1985.
[7] K. M. Wong, Q. T. Zhang, J. P. Reilly, and P. C. Yip, "On
information theoretic criteria for determining the number of
signals in high resolution array processing," IEEE Trans.
ASSP, vol.38, no. 1 1, pp. 1959- 1970, Nov. 1990.
[8] M. Wax, “Detection and localization of multiple sources in
spatially colored noise,” IEEE Trans. SP, vol.40, no.l,
pp.245-249, Jan. 1992.
[9] Q. Wu and D. R. Fuhrmann, "A parametric method for
determining the number of signals in narrow-band direction
finding," IEEE Trans. SP, vol.39, no.8, pp. 1848- 1857, Aug.
1991.
[10] M. Wax, "Detection and localization of multiple sources in
spatially colored noise," IEEE Trans. SP, vol.40, no.l,
pp.245-249, Jan. 1992
[11] W. Wu, J. Pierre, and M. Kaveh, "Practical detection with
calibrated arrays,” Proc. of Statistical Signal and Array
Processing Workshop, pp. 82-85, Canada, Oct. 1992.
[12] Richard A. Johnson and Dean W. Wichem, Applied
Multivariate Statistical Analysis, Prentice-Hall, Inc., New
Jersey, 1988.
107
ADAPTING MULTITAPER SPECTROGRAMS TO LOCAL FREQUENCY
MODULATION
James W. Pitton
Applied Physics Laboratory, University of Washington
1013 NE 40th St.
Seattle, WA 98105
E-mail: pitton@apl.washington.edu
ABSTRACT
This paper presents further extensions to the multita¬
per time-frequency spectrum estimation method devel¬
oped by the author. The method uses time-frequency
(TF) concentrated basis functions which diagonalize
the nonstationary spectrum generating operator over
a finite region of the TF plane. Individual spectro¬
grams computed with these eigenfunctions form direct
TF spectrum estimates, and are combined to form the
multitaper TF spectrum estimate. A method is pre¬
sented for adapting the multitaper spectrogram to lo¬
cally match frequency modulation in the signal, which
can cause broadening of the spectral estimate. An F-
test for detecting and removing frequency-modulated
tones is also given.
1. INTRODUCTION
Thomson’s multitaper spectral estimation approach [1]
is a powerful method for nonparametric spectral esti¬
mation. This method uses a set of orthogonal data ta¬
pers that are maximally concentrated in frequency and
diagonalize the spectral generating operator. These
tapers are used to approximately invert the operator
and estimate the spectrum. The multitaper approach
was first applied to time-frequency (TF) analysis by
a direct extension to the nonstationary case through a
sliding-window framework [2], in which spectrograms
are computed with each of the tapers and combined
to form an estimate of the TF spectrum. A multita¬
per TF spectrum was constructed using spectrograms
computed with Hermite windows [3], which had pre¬
viously been shown to maximize a TF concentration
measure [4]. This method was extended to include a
means of reducing artifacts using a TF mask [5]. More
recently, a multitaper method for TF analysis was pre-
This work was supported by the National Science Foundation
and the Office of Naval Research.
sented by this author [6] that diagonalized the nonsta¬
tionary spectral generating operator, formally extend¬
ing Thomson’s approach to TF. Subsequent work by
the author gave bias and variance measures for the es¬
timated TF spectrum, presented an adaptive procedure
to reduce the bias of the individual spectrograms, and
derived other properties of the eigenfunctions and the
resulting TF spectral estimate [7, 8].
In this paper, a method is presented for adapting
the multitaper spectrogram to locally match frequency
modulation in the signal, which can cause broadening
of the spectral estimate. Frequency modulation (FM)
in the signal will degrade the resolution and accuracy
of the multitaper spectrogram due to well-known spec¬
tral broadening effects. One common way of alleviat¬
ing the effects of the spectral broadening is to match
the spectrogram to the FM by frequency-modulating
the window. This approach works perfectly well when
there is only one FM rate in the signal, as is the case
with chirped sonar and radar. However, in multicom¬
ponent signals such as speech, biological, and mechan¬
ical signals, there can be multiple FM rates present
at any given time. To accurately analyze these types
of signals, it is necessary to locally adapt the multi¬
taper spectrogram to the FM at a given TF region.
This paper presents a method for performing this lo¬
cal adaptation. An F-test for detecting and removing
frequency-modulated tones is also given.
2. BACKGROUND: MULTITAPER
TIME-FREQUENCY SPECTROGRAMS
This approach to TF spectral estimation is based on a
straightforward extension of the spectral representation
theorem for stationary processes [9], and is equivalent
to a linear time-varying (LTV) filter model. Define the
signal s(t) as the output of a white-noise-driven LTV
0-7803-5988-7/00/$10.00 © 2000 IEEE
108
filter. The signal can then be written as:
s(t) = J H(t,u)ejwtdZ(oj), (1)
where H(t,ui) is defined as the Fourier transform of the
LTV filter h(t,t - r) [10]. The TF spectrum is defined
by:
P(t,w) = \H(t,w)\2. (2)
This formulation for a TF spectrum is of the same gen¬
eral form as Priestley’s evolutionary spectrum [9]; how¬
ever, H (t, oj) is not constrained to be slowly-varying.
Given a signal s(t), an estimate P(t,u>) is desired;
however, direct inversion of equation (1) is impossible.
A rough estimate of the time-varying frequency con¬
tent of s(t) may be obtained by computing its short-
time Fourier transform (STFT):
Ss(t,w) = j s(r)g(t — r)e 3UTdT ,
(3)
where g(t) is a rectangular window of length T. A
relationship between the STFT and H ( t , u>) is obtained
by replacing s(t) by its TF spectral formulation:
S,(t,w)= [ J H(T,6)g(t-T)e-^-VrdZ(0)dT.
J 1 (4)
To solve for the time-varying spectrum H(t,0), the
STFT operator g(t - r)e~jW must, be inverted. This
inversion is an inherently ill-posed problem. Instead,
the inverse solution is approximated by regularizing
it to some region R(t,u) in the TF plane, much as
Thomson regularized the spectral inversion to a band¬
width W in his multitaper approach [1]. For simplicity
throughout, R(t,u) is defined to be a square TF region
of dimension AT x A IV; however, the results readily
generalize to arbitrary regions.
In the case of spectral estimation, the operator is
square and Toeplitz; its regularized inverse is found
through an eigenvector decomposition. Such is not the
case in the TF problem; the STFT operator is neither
full rank nor square. This operator is diagonalized us¬
ing a Singular Value Decomposition, giving left and
right eigenvectors u(r) and V (t, to) and the associated
eigen (singular) values A:
9(t - r)e~iWT = £ \kuk(r)Vk*(t,cj). (5)
k
The eigenvectors it(r) and V(t,u) form an STFT pair:
V(t,u>) = J u(r)g(t - T)e~iu}TdT. (6)
The SVD relationship between u(t) and V (t, u>) is ob¬
tained by applying the STFT operator to V ( t , u>), com¬
puting the integrals only over AT x A IV:
Xu(t) = f j
J at Jaw
V(t,u)g{t-T)eju)Tdudt. (7)
The inverse STFT computed over all (t, to) also holds.
This equation can be reduced to a standard eigenvector
equation by substituting for V(t, w). The eigenvalue
equation for u(r) is then:
A u(t) = J 2AIVsinc(AIV(r - s))/(r, s)u(s)ds, (8)
where
f(r, s) = [ g(t - s)g{t - r)dt.
Jat
(9)
u(t) can be computed using standard eigenvalue so¬
lution methods. As has been discussed elsewhere, the
eigenvectors are concentrated in TF and doubly orthog¬
onal, both over the entire TF plane and over AT x AIV.
These properties are critical for the estimation method.
Next, H(t,w) is estimated regularized to the rectan¬
gular region AT x AIV by projecting it onto AT x AIV
in the vicinity of (t,ui) using the kth left eigenvector
uk{t):
Hk(t,u) = A"* / [ H(T,0)uk(t-T)eM~^TdZ(9)dT.
JatJaw
(10)
Hk is thus a direct, but unobservable, projection of
H(t,u) onto AT x AIV.
These expansion coefficients are then estimated us¬
ing the STFT of s(t ) computed using uk{t):
Sk(t, u) = J j H{T,9)uk{t-r)e-^-^TdZ{9)dT,
(11)
i.e., the kth eigenspectrum Sk(t,w) is a projection of
H(t,u) onto the kth left eigenvector uk(t), estimating
Hk(t,u>) over AT x AIV. When s(t) is a stationary
white noise process, it follows that
E[\Sk(t,w)\2] = \H(t,w)\2 = P(t,u). (12)
Thus, the individual eigenspectra are direct estimates
of P(t, u>), and are unbiased when the spectrum is white.
Next, H(t,w) is estimated over AT x AIV using the
right eigenvectors Vk(t,u>) weighted by the projections
of H(t,u> ) onto uk(t), i.e., the kth spectrogram:
K
u>) = ^2 Vk{t- t,u> - w)Sk(t,w), (13)
*i=i
109
where K ss AT AW. Choosing AT AW too small will
result in estimates with poor bias and variance proper¬
ties. The magnitude-square of H(t,Q\t,u) is an esti¬
mate of P(t,u ) over AT x A W. This estimate is a x2
random variable with two degrees of freedom (except
for DC and Nyquist) with variance P2(t,cj). The vari¬
ance of this estimate can be reduced by averaging over
AT x AW and invoking the orthogonality of T4(t,w):
P{t,w)
1
ATAW
1
ATAW
f [
Jatjaw
K
£Afc|S*(f,w)|2.
dtdw
(14)
*=i
The average of K direct estimates is a x2 random vari¬
able with 2 K degrees of freedom; hence, the variance
of this estimate is P2(t,u:)/K. If AT is chosen to be a
fixed proportion of the window length T, then this es¬
timator is consistent for fixed AIT. Note that the form
of this estimator differs slightly from that presented
previously [6, 7, 8] in the weighting by the eigenvalues.
3. LOCALLY STATIONARY PROCESSES
The estimate for P{t,u) given in equation (14) is un¬
biased for white noise. For the estimate to be unbiased
for signals other than white noise, it is only necessary
that P(t,uj) be locally white in TF, since the estimate
is regularized to AT x AIT. A similar requirement is
seen in the stationary case [1], wherein the spectrum
is assumed to be smoothly varying so that it is ap¬
proximately white over AIT. A class of stochastic pro¬
cesses known as locally stationary processes [12] satisfy
the requirement of being smoothly varying in TF, and
can be used to describe a wide variety of nonstation¬
ary signals. Locally stationary processes are stochastic
processes with covariance functions of the form
R(tuh) = E[s(t1)s*(t2)] = g(t-^2.)f(t1-t2),. (15)
where <?(•) is a nonnegative function and /(■) is a valid
covariance function; that is, f(t ) possesses a nonneg¬
ative Fourier transform F(oj). Through a change of
variables, the symmetric form of the covariance func¬
tion is seen to be:
Rs(t,r) = E[s{t + r/2)s*{t - r/2)] = g{t)f(r), (16)
The TF spectrum is thus given by [11]:
Ps(t,uj) = g(t)F(u). (17)
For locally stationary s(t), Ps(t, ui) will be approxi¬
mately constant over AT x AIT, and equation (12)
will still hold.
The class of processes with such nonnegative TF
spectra is easily extended to include a wider range of
nonstationary processes [13]. Let s(t) be a locally sta¬
tionary process with covariance function R,(t,T ) and
corresponding TF spectrum Ps(t,w). Then the linearly
frequency modulated signal s(t)eJ/3t 12 will have co-
variance Rs{t,T)e^tT and corresponding nonnegative
TF spectrum Ps(t,u> - (it). More generally, let x(t) —
s{t)e^^\ where s(t) is locally stationary with sym¬
metric covariance function R8(t,r) from equation (16).
Then the covariance of x(t ) is
Rx(t,r) = g{t)f(T)ei^t+Tl2)-^t-T/2)). (18)
By making use of the principle of stationary phase [14],
it can be shown [13] that the TF spectrum of x(t) is
given by:
Px{t,Lj) = g{t)F(u-<t>'{t)) = Ps{t,bJ-<t>'{t)). (19)
Thus, a frequency modulated locally stationary (FMLS)
process will have a TF spectrum equal to that of the
locally stationary process centered around the instan¬
taneous frequency of the FM. The generalization can
be taken one step further to define a composite FMLS
process, consisting of a sum of statistically independent
FMLS processes. The composite signal will also have
a nonnegative TF spectrum equal to the sum of the
spectra of the individual processes.
However, when s(t) is an FMLS process, P(t,u>)
will most certainly not be constant over AT x AIT,
and equation (12) will fail to be valid. In this case, the
smoothing region AT x AIT must be oriented to match
the FM of the signal. This reorientation is equivalent
to matching the spectrogram window to the FM of the
signal. This matching can be accomplished by using
a frequency modulated window in the original STFT
computation. However, in signals with multiple FM
rates, as in a composite FMLS signal, this adaptation
must be performed locally in TF, as discussed next.
4. LOCALLY MATCHED MULTITAPER
SPECTROGRAMS
To locally demodulate the spectrograms, it is first nec¬
essary to construct a reliable estimate of the local FM,
which is denoted by (i(t,u). Letting the TF depen¬
dence be implicit, (3 can be estimated by computing a
local covariance of the multitaper spectrogram normal¬
ized by the time spread: ((t-t)(u)-u>)) / ((t-t)2) , where
t and Q are the local average time and frequency, re¬
spectively; their dependence on t and ui is implied. The
covariance is computed by integrating over a finite re¬
gion of the TF plane AT x AIT as a two-dimensional
110
sliding window to provide an estimate of /3 as a function
of t and u:
JAT fAW(t -t-t)(cj-UJ- Q)P({, u>)dtdu>
fATfAW(t-i-i)*P(i,u)dtte
(20)
t and u) are computed similarly. Integrating over a
larger region will provide better variance properties at
the expense of possible bias due to multiple signal com¬
ponents with differing FM rates lying within the area
of integration.
Once (3(t,uS) has been estimated, each STFT Sk (t, u>)
is dechirped by locally convolving it with the Fourier
transform of :
s£(t,u) = J Skit, u- ey-^w^de. (21)
This convolution is shift-variant; at each frequency, a
new /3 must be used. This convolution is equivalent
to matching the STFT to the local chirp rate. While
this convoluation at first would appear to be an 0(N2)
operation, it can actually be implemented much more
efficiently. The equivalent chirp in the time domain
is of length T, the length of the STFT window. The
Fourier transform of this finite-length chirp will then
have bandwidth (3T. Thus, if the average bandwidth
of the various FM components is M = (3T bins, an
STFT with N frequency samples can be dechirped with
only NM multiplies per time slice, comparable to the
computational complexity of the STFT itself. Once all
of the Sk (t, w) are dechirped, the multitaper estimate
is constructed as usual.
5. F-TEST FOR
FREQUENCY-MODULATED TONES
The validity of the multitaper estimate rests on the
assumption that the TF spectrum is smoothly vary¬
ing over AT x AW. This assumption is violated when
spectral lines (FM or otherwise) are present in the sig¬
nal. In this case, it is necessary to estimate the tones
and remove them from the signal. Ordinarily, estimat¬
ing a tone with unknown FM would be extremely dif¬
ficult. This task is made easier, however, by the local
matching described above. Once the individual STFT’s
Sk(t,oj) have been adapted to local FM, any frequency
modulated tones in the signal will behave exactly as a
stationary tone would behave in a non-adapted STFT.
As a result, an F-test for the existence of any FM tones
in the TF spectrum can be defined by directly extend¬
ing Thomson’s approach in the stationary case. The
expected value of the kth dechirped STFT for an FM
tone fieJW) with instantaneous frequency w = <f>'{t) is:
E[Sk{t,u)\=nUk(0). (22)
The mean can then be estimated via regression:
Ef=it4(0)Sfc(t,ca)
zLium
(23)
The variance of this estimate is equal to the background
TF spectrum minus the spectral line, which is:
P(t,u>) = £ | Sk(t,u) - 0)|2 . (24)
The F-test at time t is then given by the ratio of the
power of the spectral line and that of the background
spectrum:
(25)
Under the null hypothesis, the test quantity at a single
time is the ratio of two y2 random variables with 2 and
2(K - 1) degrees of freedom. For a signal of length T
and an STFT of order N, there will be T/N indepen¬
dent blocks of data. Thus, the final F-test will be a ra¬
tio of y2 random variables with 2 T /N and 2(K—l)T/N
degrees of freedom, integrated along the contour spec¬
ified by u = cf>'(t ):
F(,l(t)) = (g-i)EL w,<m)\*zLiVi( Q)
Ef=i Eti i sk(t,m) -
(26)
If the F-test achieves the specified confidence level,
the tone should be removed by subtracting from the
STFT’s prior to forming the TF spectrum, then added
into the representation as an impulse:
Pit, w) = £(*, w)<*(w - <t>' (<)) + jy
k=i
\Sk(t,u)-Mt,u)Uk(<J-cl>’m\ (27)
Matching the STFTs to the local FM greatly simpli¬
fies the F-test. With no matching, the STFT of an FM
tone will be spread according to the sweep rate, and
will thus have a functional form dependent on (3. After
matching, the FM tone will have the same response as
a stationary tone in an unmatched STFT. Thus, the
expression for p in equation (23) can be used for all
FM rates. The procedure for testing for an FM tone
is then a four-step process: compute the test statistic
F(t,u>) over time and frequency; find candidate con¬
tours u>(t) = <j)'(t) in F(t,u); compute F(<j>'(t))\ and
test its significance.
Ill
REFERENCES
[1] D. J. Thomson, “Spectrum estimation and har¬
monic analysis,” Proceedings of the IEEE , vol. 70,
pp. 1055-1096, 1982.
[2] D. Thomson and A. Chave, “Jackknifed error es¬
timates for spectra, coherences, and transfer func¬
tions,” in Advances in Spectrum Analysis and Ar¬
ray Processing, Vol. /( S. Haykin, ed.), pp. 58-113,
Prentice-Hall, 1991.
[3] M. Bayram and R. G. Baraniuk, “Multiple
window time-frequency analysis,” in Proceedings
of the IEEE-SP International Symposium on
Time-Frequency and Time-Scale Analysis, (Paris,
France), pp. 173-176, June 1996.
[4] I. Daubechies, “Time-frequency localization oper¬
ators: a geometric phase space approach,” IEEE
Transactions on Information Theory , vol. 34,
no. 4, pp. 605-612, 1992.
[5] F. Qakrak and P. Loughlin, “Multiple window
non-linear time-varying spectral analysis,” IEEE
Transactions on Signal Processing, 1999. submit¬
ted.
[6] J. Pitton, “Nonstationary spectrum estimation
and time-frequency concentration,” in IEEE Con¬
ference on Acoustics, Speech, and Signal Process¬
ing, vol. IV, pp. 2425-2428, IEEE, 1998.
[7] J. Pitton, “Time-frequency spectrum estimation:
an adaptive multitaper method,” in IEEE Int.
Sym. Time-Frequency and Time-Scale Analysis,
(Pittsburgh, PA), pp. 665-668, 1998.
[8] J. Pitton, “Adaptive multitaper time-frequency
spectrum estimation,” in SPIE Advanced Sig.
Proc. Algs. Archs., Impl. VII, 1999.
[9] M. Priestley, Spectral Analysis of Time Series.
Academic Press - London, 1981.
[10] J. Pitton, “Linear and quadratic methods for pos¬
itive time-frequency distributions,” in IEEE Con¬
ference on Acoustics, Speech, and Signal Process¬
ing, vol. 5, pp. 3649-3652, IEEE, 1997.
[11] P. Flandrin, “On the positivity of the wigner-Ville
spectrum,” Signal Processing, vol. 11, no. 2, 1985.
[12] R. Silverman, “Locally stationary random pro¬
cesses,” IRE Transactions on Information Theory,
vol. IT-3, September 1957.
[13] J. Pitton, “The statistics of time-frequency anal¬
ysis,” Journal of the Franklin Institute, 2000. To
appear.
[14] E. Key, E. Fowle, and R. Haggarty, “A method
of designing signals of large time-bandwidth prod¬
uct,” IRE Intern. Corn. Record, vol. IV, 1961.
112
OPTIMAL SUBSPACE SELECTION FOR NON-LINEAR PARAMETER
ESTIMATION APPLIED TO REFRACTIVITY FROM CLUTTER
Shawn Kraut and Jeffrey Krolik
Department of Electrical and Computer Engineering
Duke University, Box 90291
Durham, NC 27708-0291
E-mail: kraut3ee.duke.edu, jk0ee.duke.edu
ABSTRACT
We consider the problem of constructing an optimal
reduced-rank subspace for parameter estimation, in mod¬
els where the data is a non-linear function of the param¬
eters. The solution which minimizes mean-squared er¬
ror is a compromise between the prior distribution, and
the measurement model, reducing to the Karhunen-
Loeve Transform when only the prior is considered.
The measurement model determines which parameters
the measured data is less sensitive to, and which are
therefore less estimatable. Our approach obtains pa-
rameterizations in which the influence of these param¬
eters is reduced, so that limited resources may be allo¬
cated to more estimatable features. We apply it to the
problem of estimating index-of-refraction profiles from
sea-surface clutter data.
1. INTRODUCTION
In this paper we will consider the problem of con¬
structing a reduced-dimension subspace in which to
search for parameter estimates £. Non-linear models
of the form y = L(8) + n are considered, where the
measured data y depends on the the parameter set £
through the non-linear model L{-), and is corrupted by
additive noise n. We will discuss this problem in the
specific case of estimating the tropospheric index of re¬
fraction profile from clutter returns received from ship-
based microwave radars [1]. In the “refractivity from
clutter” (RFC) problem, the data y consists of clut¬
ter returns across range, and the description of prop¬
agation through the refractivity profile yields the non¬
linear model.
We ask the following question: what is the opti¬
mal reduced-rank basis for searching for estimates of
This work was supported by SPAWAR Systems Center, San
Diego, under contract No. N66001-97-D-5028. Presented at the
10th IEEE Workshop on Statistical Signal and Array Processing,
Pocono Manor, Pennsylvania, August 14-16, 2000.
the parameter set? From an engineering standpoint,
estimating the full refractivity profile would require a
search through a large-dimensional parameter space,
and would be too computationally slow for real-time
estimation of a dynamically varying profile. From a
modeling standpoint, we are interesting in what re¬
duced parameterizations one should be estimating.
The Karhunen-Loeve Transform (KLT) describes
the optimal reduced-rank linear subspace for minimiz¬
ing compression or representation error [2], by consider¬
ing the prior statistical distribution of the parameters.
The subspace is constructed from the dominant eigen¬
vectors of the prior covariance matrix of the parameter
vector, R00 = E j££*j (with the mean of £ subtracted
out). The limitation of the KLT is that it does not in¬
corporate the estimation problem: what parameteriza¬
tions can be estimated from the data with the smallest
estimation error? If one were to consider estimation er¬
ror alone, then one would build the reduced-rank search
space from the model L(-), ignoring the prior. But the
resulting parameter basis functions might not repre¬
sent well the natural distribution of the parameters. In
the RFC example, profiles that are built from such a
basis will not necessarily look like natural, typically ob¬
served index-of-refraction profiles (for an example, see
Figure 1).
The optimal basis, in a MSE sense, is a compromise
between the two considerations of estimation and repre¬
sentation error. What is this basis? In the case of linear
models ( y=E0+n ), the problem has been investigated
and solved in two contexts. Examining Wiener filters,
in the form R^R"), Scharf found that the optimal
(minimum mean-square error) reduced-rank Wiener fil¬
ter is given by truncating the singular-value decomposi¬
tion (SVD) of RflyRyy2 , to give trunc j^R^R,,/ )Ry;/ J
(see [3], p.330, and [4]). More recently, Hua, et. a!.,
suggested the generalized KLT (GKLT), constructed
from the dominant eigenvectors of RgyRjj^Ryg (see [5,
0-7803-5988-7/00/$10.00 © 2000 IEEE
113
Index of Refractivity Vertical Profile
Figure 1: A typical tropospheric index of refraction
profile, with a tri-linear shape characterized in part by
base height, duct height, and M-deficit.
6]).
We are considering the same problem, but in the
context of non-linear models. Furthermore, in the RFC
case we will discuss, a closed-form analytical model for
L(9) is not available; the clutter return y that would
result from a given profile 9 must be computed numer¬
ically. How then do we find the optimal reduced-rank
parameter basis?
2. ORTHOGONALITY CONDITIONS
Generally, one seeks a solution 9 that maximizes
some objective function C:
max C(y,L(9)) -> 9. (1)
e
For example, for a MAP (maximum a posteriori ) esti¬
mator, C maximizes the posterior probability density
function for y. In this work we are seeking to iden¬
tify linear, reduced-rank parameterizations in the form
9r = U rb. Here Ur is is a “tall” matrix with orthonor¬
mal columns, i.e. U(,Ur = I. The problem is then
reduced to searching over candidate values of b:
max C(y,L(\Jrb)) — > 9r = U rb, (2)
6
;mh1 the basic question is, how do we choose Ur ?
Some useful results can be obtained by assuming the
following two axioms: both the full-rank and reduced-
rank estimators are uncorrelated with (orthogonal to)
the error of the full-rank estimator:
(A) E
(o - Mr
— 0; and (B) E
\e-i)£
= 0.
(3)
The first condition is strictly true for the conditional
mean (CM) estimator, which is also the Minimum Mean-
Squared Error (MMSE) estimator, and for the Linear
MMSE estimator (Wiener filter). It can be shown that
the second condition is strictly true if 9 is constructed
from the the MMSE estimator or LMMSE estimator,
and §r is constructed from the same type of estimator
of = U££. This condition basically excludes 9r from
bringing in side information about 9 that is not present
in 9. (In simple terms, we don’t have the situation
where § is poor estimator, while 9r is simultaneously
based on a good estimator.)
A consequence of this condition is that the error
correlation of 9r is greater than that of 9:
Qr = Q + ^[(l-i)(l-i)t] >Q, (4)
where Q = E |^(0 - 9) (9 - 0)tj . If we seek the reduced-
rank estimator that minimizes the residual MSE (trace
ofE[(9-9r)(9-9r)% it can be shown that the error
correlation can be rewritten as
QP = Q + (I-PP)RM(I-Pr), (5)
where R^ = E is the estimator correlation, and
Pr = UrU). is the projection onto the reduced-rank
subspace. Using the same argument as that taken for
the KLT, the reduced-rank subspace is then constructed
from the dominant eigenvectors of R^ . It should be
noted that in the case of the linear model, this result re¬
duces to the “Generalized KLT” discussed in [3, 4, 5, 6];
i.e. R^ becomes R^R^R^,,.
This solution is intuitively pleasing: to find a reduced-
rank subspace to search for parameter estimates, search
the subspace where the full-rank estimates naturally
tend to lie. Also, note that a consequence of the first
orthogonality condition is that the a priori covariance
of the parameter vector 9 — (9 — 9) + 9 can be de¬
composed into the correlation of the error (0 — 9), and
the correlation of the full-rank estimator 9. This ob¬
servation can be written in the form of a “Pythagorean
Theorem” :
R 99 = Q + R#0
-> R^ = R00 - Q- (6)
114
[12] 08270439.nuc
In this formulation, we should take the dominant eigen¬
vectors of the difference between the a priori covariance
and the full-rank error correlation, which reduces to the
KLT in the limit that the error correlation becomes
small.
3. CONSTRUCTING THE SUBSPACE IN
PRACTICE
Estimating the covariance matrix R^ over the full
parameter space may be computationally intensive in
practice, limited by the computation time of the prop¬
agation model L(-), over a set of values of 6 (either
grid points or realizations). Recently, closed-form ex¬
pressions were obtained for the Fisher information ma¬
trix in the RFC estimation problem [7], which could
in principle be used to approximate the full-rank er¬
ror correlation in Equation 6, i.e. Q<; J-1. However,
this approach is infeasible if the dimension of the ini¬
tial parameter vector 6 is too high, since the multi¬
dimensional numerical differentiation for the estimate
of J requires the evaluation of the nonlinear function
L(-) on a number of grid points that increases quadrat-
ically with the dimension.
An alternate approach taken here is to obtain sam¬
ples for a sample covariance matrix estimate of R^,
where each sample is an approximate conditional-mean
estimate 6, formulated as follows:
f f dO 6f(9,y)
= jdoemyj =
fd6 6J(6\y)m) Zi&ifim
fd&fwv)m ~ EiHm
where 6_i are samples drawn from the prior /(#), i.e.
historical data, and w(y , Of) is a normalized weighting
factor, proportional to the likelihood. So an estimate is
obtained by averaging over historical profiles that are
weighted by their likelihood of producing the data y.
4. APPLICATION: ESTIMATION OF
TROPOSPHERIC REFRACTIVITY
PROFILES
To evaluate this approach to rank-reduction, we
used profiles from the VOCAR data set, taken at three
sites off the coast of southern California in 1993 [1].
The most straightforward way to apply the KLT ap¬
proach is to simply take the profiles of M- values (mod¬
ified refractivity) over a uniform height grid, and con¬
catenate them into the columns of a data matrix 0, and
Mod. Refractivity
Figure 2: An “octo-linear” fit of a profile, consisting of
eight linear segments.
use the resulting sample covariance of R eof to gen¬
erate dominant eigenvectors/EOFs (extended orthogo¬
nal functions), and in order to generate new random
profiles for analysis. Unfortunately, a Gaussian ran¬
dom model with covariance Reof fails to reproduce
the characteristic tri-linear shape of observed profiles
(Figure 1, the second linear segment is responsible for
the downward refraction that causes ducting behavior).
In particular, the height of the duct (height of the first
two segments in the tri-linear profile) may vary con¬
siderably, and averaging over an ensemble of observed
profiles tends to suppress the key feature of the duct; it
is “washed-out” in the sample mean (not shown here).
In addition, profiles synthesized from the sample mean
and R eof tend to have many mini-ducts over the en¬
tire height range, features not observed in real data.
To formulate a random model that synthesizes real¬
istic profiles, and at the same time formulate an initial
profile parameterization, we fit each historical profile
to a profile consisting of eight linear segments (i.e., an
“octo-linear” profile), as shown in Figure 2. This pro¬
cedure fits the profile to a length- 17 parameter vector
9. corresponding to the heights of the eight segments,
the widths of the eight segments (or M-deficits), and
the the M- value at zero height (sea level).
The key characteristic of this fit is that it is feature-
based: referring to Figures 1 and 2, the top of the sec¬
ond segment was generally chosen to correspond to the
middle of the duct (the first two segments accounting
for base-height), and the top of the fourth segment was
chosen to correspond to the top of the duct (the first
four segments accounting for duct height). (For the re¬
sults shown here, the fit was obtained manually.) To
reduce a spurious source of variance in these parame¬
ters, the historical profiles were edited to remove those
115
height(m) height(m)
Prior: eigenvector |1)
Prior: eigenvector [2]
Estim: eigenvector |1]
Estim: eigenvector [2]
Figure 3: The first four dominant eigenvectors of the
prior covariance, Use- Each panel contains three plots,
the profiles corresponding to (1) the mean (dashed),
(2) and (3) the mean ± the eigenvector (scaled by the
same constant in all four panels).
for which the main duct feature was not identifiable
(such as profiles that looked basically linear, with no
apparent duct).
As might be expected, profiles synthesized from a
multivariate Gaussian model on feature-based parame¬
ters, rather than on the raw profiles, are more realistic
in terms of reproducing the gross shape of a typical
profile, including the main duct. To insure positiv¬
ity, a log-normal model was used on the heights (the
appropriateness of which was verified by inspection of
histograms from real data). The multivariate-normal
model was then applied to the vector of log(heights)
and M-deficits.
Interestingly, the resulting mean, the dashed line
in the panels of Figure 3, looks very tri-linear. The
influence of the dominant eigenvectors of the prior co-
variance R00 are depicted by the solid lines in Fig¬
ure 3. The first eigenvector corresponds to increas¬
ing base-height while decreasing M-deficit in the tri-
linear model. The second has a lot of energy going
into shrinking and expanding the length of the top seg¬
ment. This by itself is a strict degeneracy: scaling of
the length of the final segment has no effect on the pro¬
file and no effect on the clutter measurements used to
Figure 4: The first four dominant eigenvectors of the
sample estimator covariance, R^.
estimate the profile; it is only an artifact of the initial
parameterization scheme. Furthermore, for the mea¬
surement method presumed for this study, measure¬
ment of sea-surface clutter strength across range, vari¬
ations in the top-half of the the profile constitute an
effective estimation degeneracy, since they have little
effect on the ducting behavior and measured surface
clutter, and are therefore difficult to estimate.
We used a sample covariance approach to approx¬
imate the estimator covariance R^ of Equation 5. A
likelihood function f{y\0) is easily obtainable, as a func¬
tion of the propagation loss L(9) from the transmitter
to the sea surface, across range (where the dimension
of y and L is the number of range cells). The problem
is that the PE (parabolic equation) numerical propaga¬
tion of the field is time intensive, severely limiting the
number of parameter values 6t at which the propaga¬
tion loss can be evaluated.
To generate samples for a sample covariance, we
computed approximate conditional-mean based estimates
9, based on the weighted sum of Equation 7. In prac¬
tice, direct implementation of Equation 7 failed, since
the number of samples 0,- (10,000) was too small to
adequately sample the likelihood function, forcing one
weight Wi to be unity, and the rest to be zero. This
effect was ameliorated by increasing the standard de¬
viation of the likelihood function by a factor of 35.
116
Figure 5: The mean-square-error (MSE) for the MAP
estimator over a grid (1) based on the prior covariance
(KLT) and (2) based on the estimator covariance.
The weighted sum can be interpreted as summing
over different profiles that reproduce well the observed
data. This in turn has the effect of averaging over, or
“washing out” , variations corresponding to the degen¬
eracies discussed above, which have less impact on the
measurement, and which are therefore less estimatable.
The sample covariance of the resulting estimates
has the eigenvectors shown in Figure 4. These eigen¬
vectors are qualitatively preferable those of the prior
covariance, in terms of their physical interpretations:
the first corresponds to increasing duct height while
decreasing M-deficit, and the second to increasing base
height with increasing M-deficit. Note that the second
eigenvector of the prior covariance, in Figure 3 (with
energy going into scaling the top segment), is here most
closely approximated by the fourth eigenvector. So the
energy going into this degenerate, non-estimatable fea¬
ture has been reduced.
To quantitatively compare this parameterization with
that of the KLT, two grids consisting of 6000 points
were constructed from the dominant three eigenvectors
of the prior and estimator covariance, respectively. The
number of grid points was determined by the relative
energy of the eigenvectors, as reflected by the eigenval¬
ues; 25 x 16 x 15 and 40 x 15 x 10 grids were chosen for
the prior and estimator covariance eigenvectors, respec¬
tively. The mean-square-error decreases when MAP es¬
timates are found over the grid based on the estimator
covariance; see Figure 5.
5. CONCLUSIONS
In this paper we have discussed the problem of de¬
scribing the lower-dimensional parameterization of an
unknown parameter set that is optimal in the sense
of minimizing mean-squared error. This description,
in terms of a reduced-rank subspace, depends on both
the measurement model by which the data depends on
the parameters and on the a priori distribution of the
parameters. It can be viewed as a generalization of the
Kahunen-Loeve Transform, which considers only the
prior. The initial parameterization and the nature of
the measurement model may contain parameters which
are degenerate in the sense that they have less impact
on the measured data. The aim of the approach pre¬
sented in this paper is to seek parameterizations in
which the strength of these parameters is decreased, so
that the reduced-dimension parameterization empha¬
sizes more estimatable features. We have evaluated
this procedure for the application of estimating index
of refraction profiles from clutter returns, where it pro¬
duces more physically meaningful reduced-rank basis
functions, and decreases mean-squared-error relative to
the KLT basis.
REFERENCES
[1] T. Rogers, “Effects of the variability of at¬
mospheric refractivity on propagation estimates,”
IEEE Trans. Antennas Propagat., vol. 44, pp. 460-
465, April 1996.
[2] C. W. Therrien, Discrete Random Signals and Sta¬
tistical Signal Processing, Signal Processing Series,
Ed. A. V. Oppenheim. Prentice Hall, 1992.
[3] L. L. Scharf, Statistical Signal Processing, Addison-
Wesley, 1991.
[4] L. L. Scharf, “The SVD and reduced rank signal
processing,” Signal Processing, vol. 25, pp. 113—
133, 1991.
[5] Y. Hua and W. Liu, “Generalized Karhunen-Loeve
transform,” IEEE Signal Processing Letters, vol. 5,
no. 6, pp. 141-142, June 1998.
[6] Y. Hua and M. Nikpour, “Computing the reduced
rank Wiener filter by IQMD,” IEEE Signal Pro¬
cessing Letters, submitted.
[7] J. Tabrikian, “Theoretical performance limits on
tropospheric refractivity estimation using point-to-
point microwave measurements,” IEEE Trans. An¬
tennas Propagat., vol. 47, no. 11, pp. 1727-1734,
November 1999.
117
MAP MODEL ORDER SELECTION RULE FOR 2-D SINUSOIDS IN WHITE
NOISE
Mark A. Kliger and Joseph M. Francos
Department of Electrical and Computer Engineering
Ben-Gurion University
Beer-Sheva 84105, Israel.
ABSTRACT
We consider the problem of jointly estimating
the number as well as the parameters of two-
dimensional sinusoidal signals, observed in the
presence of an additive white Gaussian noise
field. Existing solutions to this problem are
based on model order selection rules, derived
for the parallel one-dimensional problem. These
criteria are then adapted to the two-dimensional
problem using heuristic arguments. Employ¬
ing asymptotic considerations, we derive in this
paper a maximum a-posteriori (MAP) model
order selection criterion for jointly estimating
the parameters of the two-dimensional sinusoids
and their number.
1. INTRODUCTION
From the 2-D Wold-like decomposition we have that
any 2-D regular and homogeneous discrete random field
can be represented as a sum of two mutually orthog¬
onal components: a purely-indeterministic field and a
deterministic one. The purely-indeterministic compo¬
nent has a unique white innovations driven moving av¬
erage representation. The deterministic component is
further orthogonally decomposed into a harmonic field
and a countable number of mutually orthogonal evanes¬
cent fields. In this paper we consider a special case of
the foregoing general problem. More specifically, we
consider the problem of jointly estimating the num¬
ber as well as the parameters of the sinusoidal signals
comprising the harmonic component of the field, in the
presence of the purely-indeterministic component, as¬
sumed here to be a white noise field.
A solution to this problem is an essential compo¬
nent in many image processing and multimedia data
processing applications. For example, in indexing and
This work was supported in part by the Israel Ministry of
Science under Grant 1233198.
retrieval systems of multimedia data that employ the
textural information in the imagery components of the
data, e.g., [7], the identification of similar textured sur¬
faces as being such, is highly sensitive to errors in es¬
timating the orders of the models of the deterministic
components of the textures. More specifically, in this
approach the 2-D Wold decomposition based paramet¬
ric model of each textured segment of the image also
serves as the index of this segment. Therefore an accu¬
rate and robust procedure for estimating the orders as
well as the parameters of the models of the determin¬
istic components of the textures is an essential compo¬
nent in any such indexing and retrieval system. Simi¬
lar requirements are posed by parametric content-based
image coding and representation methods.
The same type of problem, i.e., joint estimation of
the model order and parameters for a sum of 2-D si¬
nusoidal signals observed in additive noise, naturally
arises in processing 2-D SAR data. In this problem
however the observed random field is complex valued,
where for each scatterer one frequency parameter cor¬
responds to the range information, while the second
frequency parameter is the Doppler. The complex val¬
ued amplitude of each such exponential is proportional
to the radar cross section of the target.
Many algorithms have been devised to estimate the
parameters of sinusoids observed in additive white Gaus¬
sian noise. Most of the algorithms assume that the
number of sinusoids is a-priori known. However this
assumption does not always hold in practice. Hence, in
the past two decades the model order selection problem
has received considerable attention. In general, model
order selection rules are based (directly or indirectly)
on three popular criteria: Akaike information criterion
(AIC), the minimum description length (MDL) and
the maximum a-posteriori probability (MAP) criterion.
All these criteria have a common form in that they com¬
prise two terms: a data term and a penalty term, where
the data term is the log-likelihood function evaluated
for the assumed model. However, most of the papers
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
118
dedicated to this problem discuss the model order selec¬
tion problem for various models of one-dimensional sig¬
nals, while the problem of modeling multidimensional
fields has received considerably less attention. Djuric,
[1], proposed a MAP order selection rule for 1-D sinu¬
soids observed in additive white noise. Kavalieris and
Hannan, [4], prove the strong consistency of a crite¬
rion, that indirectly employs the MDL principle. In
this framework the observation noise is modeled as an
autoregression of an unknown order. In the special
case where the noise process in [4] is assumed to be a
white noise process, the resulting criterion is identical
to the MAP criterion derived in [1]. Stoica et al, [5]
proposed the cross-validation selection rule and demon¬
strated its asymptotic equivalence to the Generalized
Akaike Information Criterion (GAIC). In [6] this crite¬
rion is applied to the 2-D problem as well, where the
penalty term is proportional to the total number of un¬
known parameters, exactly as in the 1-D case. In this
paper we derive a MAP model order selection criterion
for jointly estimating the number and the parameters
of two-dimensional sinusoids observed in additive white
noise.
The paper is organized as follows. In Section 2 we
define our notations, while in Section 3 we formally
define the MAP model order selection problem. The
MAP model order selection criterion is derived in Sec¬
tion 4. Finally, in Section 5 we provide some numerical
examples and Monte-Carlo simulations to better illus¬
trate the performance of the proposed criterion.
2. NOTATIONS AND DEFINITIONS
The considered random field is composed of an har¬
monic field embedded in Gaussian noise. Let {y(n, m)}
where (n,m) € U and U = {(n,m) | 0 < n < 5 — 1, 0 <
m < T - 1}, be the observed S xT real valued data
field. The elements of y(n, m) may be represented as
y(n, m) = h(n, m) + u(n, m). (1)
The field {u(n,m)} is the 2-D zero mean, Gaussian
white noise field with variance a2. The field {h(n,m)}
is the harmonic random field
k
h(n, m) — Ci cos(nu>i + mi/*) + Gi sin (nw, + mvi)
i= 1
(2)
where k denotes the number of sinusoidal components
in the data model, and (w*, vt) is the spatial frequency
of the ith component. The CVs and Gi s are the am¬
plitudes of the sinusoidal components in the observed
realization.
Let us define the following matrix notations:
y = [ 2/(0, 0), . . .,y(0,T - 1),»(1,0), . . .
...,y(S -l,T -l)f (3)
The vectors u and h are similarly defined. Rewriting
(1) we have y = h + u. Let A denote the covariance
matrix of y. Thus
A = a2IsTxST (4)
where I stxST is an ST x ST identity matrix. Hence,
|A| = cr2ST . Also define
a = [C\,G\, C2, G2, ■ ■ ■ , Cfc, Gk}T ■ (5)
Let
Aj =
j[ Oaii+Oi/j] j[0u)i+(T-l)vi]
e , . . . , c ,
. . . , e
J[(S-l)<*+(T-l)i/d
(6)
and let us define the following ST x 2k matrix
D = [ Re(Ai),Im(Ai),Re(A2),Im(A2), . . .
...,Re(Afc),Im(Afc)] (7)
Using the foregoing notations we have that
y = Da + u. (8)
In the following it is assumed that the matrix DrD is
full rank.
3. MAP MODEL ORDER SELECTION
CRITERION
Let p(k ) be the a-priori probability that there exist
k sinusoidal components in the observed field. It is
assumed that there are Q competing models, where
Q > M ( M being the actual number of sinusoidal com¬
ponents), and that each model is equiprobable. That
iS 1
p{k) = Q, k e Zq (9)
where Zq = {0, 1,2,..., Q}- The MAP estimate of
M is the value of k that maximizes the a-posteriori
probability p(fc|y), where k £ Zq. More specifically,
Mmap =
jp(%) j
arg max
k€ZQ
f p(y\k)p(k) )
arg max < - r-r - f
kezQ l P( y J
p( y)
jp(y|fc)j
arg max < lnp(y|fc) >
k€ZQ l J
arg max
IcEZq
(10)
119
where p(y|fc) denotes the conditional probability of y
given that there are k sinusoidal components in the
data.
Let
W = [aJi,w2,...,u>k,u1,u2,...,uk,]T . (11)
Also let 1Z+ denote the positive real line, let Ak =
TZ2k , and let flk = ([0, 27r))2fc. Thus, we have that
a e 7Z+, a 6 Ak, and W € flk- Using these notations
the conditional probability density p(y\k) is expressed
by
p(y|*) = f f [ p(y|fc,w,<r,a)
Jnk Jn+ JAk
x p(W,o, a\k)dadodW (12)
where p(W,cr, a|fc) is the a-priori probability of W, o
and a given there exist k sinusoidal components in the
observed data.
4. DERIVATION OF THE CRITERION
4.1. Priors Selection
Inspecting (10) and (12) we conclude that finding Mmap
using the observed data only, requires that some as¬
sumptions be made regarding the prior distribution of
the model parameters, p(W,a, a|fc). Clearly our goal
is to derive a model selection rule that will be based
on a non-informative prior about the parameters. In
other words, the selected prior should be chosen such
that it represents the lack of a-priori knowledge of the
values of problem parameters, before the data is ob¬
served. (See, e.g., [2] for a detailed discussion of the
problem of choosing non-informative priors).
Clearly,
£>(W , <7, a|fc) = p(<x,a|W,A:)p(W|fc). (13)
Since the sinusoidal frequencies are assumed indepen¬
dent of each other (*. e. , that they are not harmonically
related) , the lack of a-priori knowledge of the frequen¬
cies is modeled by assuming the frequencies (wj,t/j) to
be uniformly distributed on fi*. Thus,
p(wi*=> = ■ m>
Note that since the probability of w; being equal to Uj
for some i ^ j is zero (and similarly for i being equal
to i>j), we assume in the following that for all i ^ j,
to i A Uj (and similarly i/f Vj). Hence the following
derivation of the model order selection criterion holds
almost everywhere in the problem probability space,
i.e., except for a set of models of probability measure
zero.
Given that W and k are known, D is also known
and the observation model (8) becomes a linear regres¬
sion model where the observations are subject to a zero
mean white Gaussian observation noise with variance
cr2, such that a, o are unknown. For this problem it is
shown in [2] that in the space defined by a and In a the
shape of the likelihood function surface is “data trans¬
lated”, i.e., it is invariant to translations that result
from the different values these parameters assume in
different realizations of the observed data. Hence the
idea that little is known a-priori relative to the infor¬
mation contained in the observed data is expressed by
choosing a prior distribution such that p(lnor, a|W, k)
is locally uniform, or equivalently that
p(fj, a|W, A:) a cr-1 . (15)
Substituting (14) and (15) into (13) we have that the
desired non-informative prior is given by
p(W,(r,a|fc)oc^jir,T_1- (16)
4.2. Evaluation of the a-Posteriori Distribution
In this subsection we derive an approximate expres¬
sion for the a-posteriori probability distribution p(y|fc)
given in (12). Since the noise field {u(n,m)} is Gaus¬
sian we have using (4) and (8)
p(y|fc.W,er,a) = p(u|er)
= (2tt(t2)-^L exP j-^2 (y “ Da)T(y - Da) J.(17)
Let a = (DrD)~ 1 D7y and let Px denote the projec¬
tion matrix defined by
Px = I-D(DrD)1DT. (18)
Using these notations we have that
(y - Da)r(y - Da) = yTP±y + (a - a)TDTD(a - a).
(1Q)
Applying the prior (16) and evaluating the marginal
distribution we have
P(y>W,cr|A:) = [ p(y|fc,W,cr,a)p(W,er,a|fc)da
JAk
^ Ia ^27rcr2)_^: ^pj-^2^ -Da)T(y~Da)j
X (2jr)2fc<r da
= (2jrCT3)-V— — L— exp{ — 2^yTp±y}
120
x J exp|-^-(a - a)TDTD(a - a)| ds
= V™2rV(2^exr{-^yTply}
Substituting (23) into (22) and employing the Laplace
asymptotic approximation we have that as ST -» oo
(^/2^ra)2k
|DtD|!/2 '
(20)
Next, we evaluate p(y,W\k). Substituting (20) we
have
p(y,W|fc) = f p{y,W,a\k)da
Jn+
n-2k-l ST+2k (ST - Tt>±. \-st-‘
oc 2 2k x7r 2 , ( - - - J |D D| 2(y P y) 2
p(y|*0
Lemma 1
[ exp{ST
J Qi,
lnp(y, W\k) .
p(y,W\k)(2n)k\HMLrHST)-k (25)
= 0(S2kT2k) .
where ,( ■) is the standard Gamma function (see, e.g.,
[2] for the integration residt).
Finally, to obtain an expression for the conditional
probability p(y|fc) we have to evaluate
p(y\k)= f P(y,W|fc)dW. (22)
J Qk
Since a direct analytic solution to this integration prob¬
lem does not exist, we derive an approximate solution,
employing the Laplace integration method (see, e.g.,
[3] ). Following [3], p. 71, we first expand lnp(y, W| k)
into a Taylor series about W, where W denotes the
ML estimate of W. Since W is a maximum point of
the likelihood function, the first order derivatives of
^ lnp(y, W\k) at this point vanish. Omitting from
the expansion terms of order higher than two, we have
Proof: See [9].
Substituting (21) and (26) into (25) we have
p(y|fc) oc 2 k 1tt S( , ^
x (yTPxy)
ST -2k
2
)|£>TD|
-2krp—2k \
where D and Px are the matrices D and Px, respec¬
tively, with W substituted by its ML estimate, W. It
is possible to further simplify (27) by observing that
|DtD| = 0(S2kT2k) (see [9]). Furthermore, employ¬
ing the asymptotic properties of the Gamma function
(see, e.g., [8], p. 31) we have that as ST — > oo,
'ST -2k'
ST -2k-
2
p(y,W|fc) = exp \ST
lnp(y, W\k)
~ exp |
where
H ml —
nlnp(y,W|fc) ST
(W-W)tHMl(W
-W)J
Substituting these approximations into (27), and omit¬
ting terms that are independent of k, the final form
of the model order selection criterion can be readily
established:
Mmap
= argmin < — lnp(y|fc) >
k€ZQ l J
= arg min ( — ln(y7Pxy) + ^ln|DTD|
fcezc I 2 Z
+A;ln + 2k\nST + (k + l)ln2l
d2 In p(y,W|fc) . ,
9Wj 8Wi8W2
92 In p(y,W|fc) d2 lnp(y,W|fc)
9W29Wi 9Wj
?2lnp(y,W|fc) 92lnp(y,W|fc)
d\V2kdW1 0W2k3W2
82 lnp(y,W|<;
92 lnp(y,W|*0
awiflWj*
92 In p(y,W|A:)
9W29W2k
92 lnp(y,W|fc)
= arg mm
k£.ZQ
ST -2k
2
ln(yTPxy) + 4k In ST | (29)
is the Hessian matrix of ^=lnp(y, W|fc) evaluated at
W = W. As W is a maximum point of lnp(y, W|fc),
H ml is positive definite. Since In p(y, W|fc) is assumed
sufficiently smooth at W, H ml is symmetric.
5. NUMERICAL RESULTS
To illustrate the performance of the proposed model
order selection rule we present some numerical exam¬
ples. In the examples below, the data field was gen¬
erated with four equiamplitude sinusoidal components,
and we define
SNR; = 10 log
Cf + G2
121
The noise is a white Gaussian noise field with variance
a2 which is chosen to yield the desired signal to noise
ratio. In these experiments the signal to noise ratio of
each component, SNR,, varies in the range of -15dB to
-5dB, in steps of ldB. For each SNR, 100 Monte-Carlo
experiments are performed. The data field dimensions
are 32 x 32. The frequencies of the sinusoidal com¬
ponents are ( — 27r0.155, 27t0.253), (-27t0.155, 27t0.296),
( — 27t0. 112, 27r0.274), (2tt0.112, 2tt0.201). Their ampli¬
tudes are given by C, = G, = 1, * = 1,...,4. The
performance results of the proposed MAP selection cri¬
terion are summarized in Table 1 for various values of
SNR,. For comparison, the performance results of the
GAIC criterion, [6], are listed as well. To further il¬
lustrate the performance of the proposed MAP model
order selection criterion, the probabilities of correct
model order selection for the two criteria are depicted
in Fig. 1. The simulation results demonstrate that
even for modest dimensions of the observed field, and
relatively low SNR’s, i.e., as low as -9dB, the error rates
of both the MAP and the GAIC model order selection
criteria are very low. The performance of the MAP rule
is shown to be better than that of the GAIC for lower
SNR’s. Furthermore, the results indicate that for the
lower SNR range, the probability of correct model order
selection by the MAP criterion is not only higher, but
also that the magnitude of the error is much smaller
than in the case of the GAIC model order estimate.
SNRj
k=l
k=2
k=3
k=4
-15dB
MAP
29
34
29
8
GAIC
94
6
0
0
-14dB
MAP
4
27
45
24
GAIC
86
12
2
0
-13dB
MAP
3
13
46
38
GAIC
57
33
8
2
-12dB
MAP
0
3
18
79
GAIC
22
23
27
28
-lldB
MAP
0
0
7
93
GAIC
2
6
21
71
-lOdB
MAP
0
0
4
96
GAIC
0
0
8
92
-9dB
MAP
0
0
0
100
GAIC
0
0
0
100
Table 1: Performance comparison of MAP and GAIC
criteria for various values of SNR,.
6. REFERENCES
[1] P. M. Djuric, “A Model Selection Rule for Sinu¬
soids in White Gaussian Noise,” IEEE Trans. Sig¬
nal Process., vol. 44, pp. 1744-1751, 1996.
Figure 1: Probabilities of correct model
order selection. The solid and the dashed
lines represent the MAP and the GAIC
performance curves, respectively.
[2] G. E. P. Box and G. C. Tiao, Bayesian Inference
in Statistical Analysis., New York: Wiley, 1992.
[3] N. G. De Bruijn, Asymptotic Methods in Analysis,
3rd edition, Amsterdam: North-Holland Publish¬
ing Co., 1970.
[4] L. Kavalieris and E. J. Hannan, “Determining the
Number of Terms in a Trigonometric Regression,”
J. Time Series Anal., vol. 15, pp. 613-625, 1994.
[5] P. Stoica, P. Eykhoff, P. Janssen and T. Soder-
strom, “Model-Structure Selection by Cross-
Validation,” Int. J. Control , vol. 43, pp. 1841-
1878, 1986.
[6] J. Li and P. Stoica, “Efficient Mixed-Spectrum Es¬
timation with Application to Target Feature Ex¬
traction,” IEEE Trans. Signal Process., vol. 44,
pp. 281-295, 1996.
[7] R. Stoica, J. Zerubia and J. M. Francos, “The
Two-Dimensional Wold Decomposition for Seg¬
mentation and Indexing* in Image Libraries,” Int.
Conf. Acoust., Speech, Signal Processing, Seattle,
1998.
[8] E. D. Rainville, Special Functions, MacMillan,
New York, 1967.
[9] M. Kliger and J. M. Francos, “MAP Model Order
Selection Criterion for 2-D Sinusoids in Noise,” in
preparation.
122
OPTIMUM LINEAR PERIODICALLY TIME- VARYING FILTER
Dong Wei
Center for Telecommunications and Information Networking
Department of Electrical and Computer Engineering, Drexel University
Philadelphia, PA 19104 U.S.A.
E-mails: wei@ece.drexel.edu
ABSTRACT
We study the optimum (in the minimum mean-square
error sense) linear periodically time-varying deconvo¬
lution filter of finite size. We show that the filter can
be in the form of lapped transform or multirate filter-
bank, and it includes the FIR Wiener filter as a special
case. We demonstrate that the proposed filter always
possesses a gain over the Wiener filter.
1. INTRODUCTION
Consider the discrete-time model
x[n] = (s * h)[n] + w[n] (1)
JV-1
= ^ h[m]s[n - m] + w[n] (2)
m= 0
where s[n] is the original signal, h[n\ is a known linear
time-invariant (LTI) system with N taps, u>[n] is the
additive noise, x[n] is the observed data, and the sym¬
bol * denotes convolution. We assume that both s[n]
and w[n] are zero-mean, wide-sense stationary, second-
order random processes, their second-order statistics
are known, and they are uncorrelated, i.e.,
= (3)
for any n and k. Such a model has been widely used in
signal processing applications such as filtering, smooth¬
ing, prediction, noise canceling, and deconvolution, just
to name a few.
The goal is to estimate the signal s[n] from the
noisy, filtered data x[n]. An LTI finite impulse response
(FIR) filter f[n] can be applied to x[n]. The resulting
estimate of s[n] is given by
(x*f)[n\
(4)
K- 1
Y. f[m]x[n - m]
m= 0
(5)
where K is the length of f[n]. The FIR Wiener decon¬
volution filter is the optimum LTI FIR system (denoted
by the vector fopt) in the minimum mean-square error
(MMSE) sense:
/opt = argmmF{|s[n] -s[n]|2} (6)
where
/ = [/[0] /[l] ... f[K- l]f (7)
with the symbol T denoting matrix transpose.
We now reconsider the optimality of the Wiener
filter in (6) from a different viewpoint. The filtering
operation in (4) can be expressed as
s[n] = F lti*M (8)
where
s[n] = [s[n] s[n - 1] ... s[n - M + 1 ]]T, (9)
a:[n] = [x[n] x[n — 1] ... x[n - L + 1]]T, (10)
Flti is an M x L matrix given by
' fT 0 0 ... 0
0 fT 0 ... 0
Flti =
.0 0 ... 0 fT
and L = K + M — l.
The linear lapped transform [1]
s[n] = Fx[n] (12)
where F is an M x L constant matrix, is a more general
version of linear filtering than (8). We require that
M < L. When M = L, the linear lapped transform
reduces to a linear block transform.
A few interesting questions arise. Does the Wiener
filter result in the optimum (in the MMSE sense) es¬
timate s[n]? If it does not, how can we do better and
what is the best estimate?
In this paper, we answer these questions.
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
123
2. MINIMUM MEAN SQUARE ERROR
LINEAR PERIODICALLY TIME- VARYING
FILTERING
2.1. Some Basics
Since
s[n - IM] = Fx[n - IM] (13)
for any integer /, such a linear lapped transform is in
general a generic LPTV filter with period M. The
LPTV filter can be implemented by means of an M- •
channel multirate filterbank [2].
When M = 1, the LPTV filter reduces to an LTI
filter with L taps. For M > 1, the LPTV filter reduces
to an LTI filter if and only if the M x L matrix F sat¬
isfies. This implies that any LTI filter of length up to
L-M+ 1 is a special case of the LPTV filter character¬
ized by F. Therefore, the optimum LPTV filter of size
M x L always possesses a gain over the Wiener filter of
length L — M + 1. Such a gain results from the more
flexible processing of data blocks than the LTI filtering.
For filtering, the two filters require L and L - M + 1
multiplications per data sample, respectively. When M
is small compared to L, their computational complex¬
ities are comparable.
2.2. The Optimum Filter
The model in (1) can be expressed in the vector form:
x{n) = Hs[n] + w[n] (14)
where H is an L x (L + N — 1) matrix given by
' hT 0 0 ... 0 ■
0 hT 0 ... 0
H =
0 0
... 0 hT _
>
(15)
h =
[Mo] Mi]
... h[N - 1]]T,
(16)
s[n] = [s[n] s[n — 1]
and
s[n — L — N + 2]]t ,
(17)
w[n] = [?u[n]
w[n — 1]
. . . w[n — L + l]]r.
(18)
Define the estimation error in the nth block as
We attempt to design the optimum F to minimize the
mean-square error (MSE) in the block s[n], which is
given by
J = bEieH^ e(n]}' (23)
It follows that
J = ± E{\\(FH - A)s[n } + Fm[n]||2} (24)
= ^E{tv[((FH - A)s[n] + Fw[n ])
x((FH - A)s[n] + Fw[n])H]} (25)
= ~tr[(FH - A)RS(FH - Af + FRwFh]
(26)
= ~tx[F{HRsHH + Rw)Fh + ARSAH
-FHRsAh - ARSHHFH ] (27)
where
Setting
Rs = £{s[n]sff[n]},
Rw = E{w[n]wH[n]}.
dj
dF
F=F„pt
0 MxL,
(28)
(29)
(30)
we obtain the matrix form of the Wiener-Hopf equa¬
tions:
Fopt{HRsHH + Rw) = ARsHh. (31)
Therefore, the optimum LPTV filter is
Fopt = ARsHh{HRsHh + R,„)-1 (32)
and the resulting minimum MSE is
Jlptv ,min — &s -^rtr [ARsHh{HRsHh + Rlu)~1
xHRsAh], (33)
The optimum filter can be viewed as the extension
of the FIR Wiener filter to LPTV system. Indeed, when
M — 1, Fopt reduces to the FIR Wiener filter with L
taps. On the other hand, for D = 1,2, ...,M, the
Dth row of the filtering matrix Fopt is the MMSE FIR
filter for estimating s[n - D + 1] from the data set
{x[m] : —oo < rn < n}.
e[n ] = s[n] - As[n] (19)
where
A = [Im Omx(l-m+w-i)], (20)
and
e[n] — F(/Ts[n] + m[n]) - As[n] (21)
= (FH-A)s[n) + Fw[n\. (22)
2.3. When Is There No Gain?
In general, the performance of the optimum LPTV fil¬
ter is better than the performance of the optimum LTI
filter in the sense that
JhPTV ,min < Jvn ,min (34)
where the equality holds if and only if
124
• the signal s[n] is a white noise process, i.e.,
E{s[n]s*[n + l]}=<r2s6[l}, (35)
• the noise rc[n] is a white noise process, i.e.,
E{w[n\w*[n + /]} = <r%6[l], (36)
and
• the LTI system h[n] has only one tap, i.e., N = 1.
2.4. Asymptotic Performance Analysis
We assume that s [n] and w[n] are both regular pro¬
cesses with rational power spectra.
Let fo [n] denote the causal, infinite impulse re¬
sponse (HR) Wiener deconvolution filter for estimat¬
ing s[n — D] from the data set {rr[m] : — oo < m < n},
where D > 0 indicates a delay. The performance of
frj[n] shall be used in our analysis of the asymptotic
performance of the proposed optimum LPTV filter.
The transfer function of fo M is given by
FD(z) =
<jIQ(z)
z-dPs(z)H*{1/z*)
Q*(l/z*)
(37)
J +
where Q(z ) is the monic, minimum-phase factor deter¬
mined by the spectral factorization of the power spec¬
trum of a:[n]:
Px(z) = H(z)H*(l/z*)Ps(z) + Pv,(z) (38)
= olQ{z)Q*{l/z*) (39)
and the subscript “+” is used to indicate the “positive¬
time part” of the sequence whose z-transform is con¬
tained within the brackets. The resulting MSE is
MMSE FIR filter converges to the MMSE causal IIR
filter. Therefore,
Jv\K ,min > lim Jy\k ,mtn (42)
L-»oo
= ^IIR.min- (^3)
If M is fixed and L tends to infinity, then the Dth
row of the optimum filtering matrix Fopt converges to
/j}[n]. Therefore,
Jlptv ,min ^ lim JhPTV ,min (44)
L-*oo
n M- 1
= 77 £ ^mimin' (45)
D= 0
If both M and L approach to infinity with K — L -
M + 1 fixed, then
lim JLpTV.n
L— too.M— k oo
1 M- 1
(46)
which corresponds to the MSE of the non-causal IIR
Wiener filter.
In summary, the optimum LPTV filter outperforms
the Wiener filter asymptotically.
3. CONCLUSION
We have presented the MMSE linear periodically time-
varying deconvolution filter. The proposed filter out¬
performs the it linear time-invariant counterpart at the
expense of increase in computational complexity and
delay.
oo N-l
^/[Rlmin =m»)-EE fD[l]h[n]rs[D — n — l] (40)
/=0 n= 0
or
j(°) — J_ f p (e3u)
x [1 - FD(ejw)H{eju)eju,D ] dw. (41)
The derivations of the optimum filter /p>M and its
associated MSE are given in Appendix.
In general, increasing the delay D leads to the smaller
min- Asymptotically, /oo[n] is the non-causal IIR
Wiener filter.
The performance of estimating the signal block As{n\
using the filtering matrix F can be improved if more
observed data are processed, or equivalently, the pa¬
rameter L is increased. As L tends to infinity, the
APPENDIX
We now prove (37) and (40).
According to the model given in (1), we first whiten
the process x[n ] to obtain a unit- variance white noise
process:
y[n] = (b * x)[n] (48)
where the whitening filter b[n] is given by
B ^ o-oQ(z)
(49)
which is causal and stable. Next, we obtain the esti¬
mate of s[n-D] by filtering y[n] with a causal IIR filter
gM-
OO
- D] = £ 9[m]y[n ~ m\- (50)
m= 0
125
the resulting MSE is
To minimize E{|s[n - D] — s[n - D] |2} with respect to
g[ri\, we use the orthogonality principle to obtain the
Wiener-Hopf equations
£{(s[n - D] - s[n - D])y*[n - &]} = 0
for 0 < k < oo, or equivalently,
rsy[k-D] - ^ g[m}ry[k - m]
m= 0
= #]
for 0 < k < oo. Therefore,
G(z) = [z~DPsy(z)]+.
Since
rS2/[fc] = E{s[n]y*[n-k]}
= i?/s[n] ( Y 6[/]x[n — k — l]
u= o
which implies that
Psy(z) = B*(l/z*)H*(l/z*)Ps(z)
= H*(l/z*)Ps(z)
a0Q*(l/z*)
(51)
(52)
(53)
(54)
(55)
(56)
(57)
= £>*[/]»■„,[* + *]
1=0
oo
= £V[Z]£{s[n + fc + Z]
(=0
x /i[m].s[n - m] + ro[n]^ | (58)
oo N— 1
b* [l]h* [ m]rs [k + 1 + m] (59)
1=0 m= 0
(60)
(61)
Since the causal, HR Wiener deconvolution filter for
estimating s[n - D] from the data set {i[m] : — oo <
m < n} is given by
fD(z) = B(z)G(z) (62)
1 \z-dH*(1/z*)Ps(z)]
< ?oQ(z ) [ a0Q*(l/z*)
(63)
J +
Since
rsx[fc] = h[n\s[m - k - n]^ |
+E{s[n\w*[n — A;]}
AT-l
53 h*[n]rs[k + n],
(64)
(65)
n=0
y(D)
^IIR,min
E{{s[n-D]-s[n-D])s*[n-D]} (66)
rs[0] - E|^/z)[Z]2:[n - Z]s*[n - I>] j
(67)
OO
rs[0]-J2fD[lKx[l-D] (68)
1=0
oo N—l
r4°] - 53 53 f^Mn]r*,[l - D + n ] (69)
1= 0 n= 0
oo N — l
'■M-EI fD[l]h[n]rs[D - l-n ]. (70)
;=o n=0
REFERENCES
[1] H. S. Malvar, Signal Processing with Lapped Trans¬
forms. Boston, MA: Artech House, 1992.
[2] P. P. Vaidyanathan, Multirate Systems and Filter
Banks. Englewood Cliffs, NJ: Prentice-Hall, 1992.
126
Fast Approximated Sub-Space Algorithms
Mohammed A. Hasan1 and Aii A. Hasan*
* Dept of Electrical &; Computer Engineering, University of Minnesota Duluth
^College of Electronic Engineering, Bani Waleed , Libya
Abstract
In this paper, fast techniques for invariant subspace sep¬
aration with applications to the DOA and the harmonic
retrieval problems are presented. The main feature of
these techniques is that they are computationaly effi¬
cient as they can be implemented in parallel and can
be transformed into matrix inverse-free algorithms. The
basic operations used are the QR factorization and ma¬
trix multiplication. Specifically, two types of methods
are developed. The first method uses Newton-like itera¬
tion and is quadratically convergent. The second method
can be developed to have convergence of any prescribed
order. Using these approximations, the minimum norm
solution for the DOA and the harmonic retrieval prob¬
lems for the projection of least squares weight onto the
signal subspace of the data is obtained simply, without
performing any SVD. Some of the developed methods
are also examined on several test problems.
1. Introduction
The estimation of projections onto selective set of invari¬
ant subspaces of data and covariance matrices is a com¬
mon requirement in the development of high resolution
methods. This situation arises in adaptive processing of
sensor array data or sum of sinusoids where the estima¬
tion of the number of strong signals present in a given
set of data and the projections onto signal and noise
subspaces is essential. Subspace based methods for fre¬
quency estimation rely on a low rank system model that
is obtained by organizing the observed data samples into
vectors. MUSIC and ESPRIT based estimators are then
obtained using this vector model.
Projection of the least-squares weight vector onto
subspace of reduced dimension is an established tech¬
nique for reducing the number of adaptive degrees of
freedom used by an adaptive sensor array. The main
problem in conventional algorithms for subspace esti¬
mation based upon eigenvalue decomposition (EVD) or
singular value decomposition (SVD) are, however, both
expensive to compute and difficult to make recursive or
implement in parallel. In contrast, algorithms based on
the QR factorization have established pipelinable archi¬
tectures.
Since many signal processing applications (e.g. pro¬
jection beamforming, MUSIC) do not explicitly utilize
the full set of signal eigenvalues, diagonalizing the co¬
variance matrix of the data is not necessarly advanta¬
geous and is not required. Various alternatives were
proposed by several authors. Kay and Shaw [1] sug¬
gested the use of polynomials and rational functions of
the sample covariance matrix for approximating the sig¬
nal subspace. In [2], Tufts and Melissinos used Lanczos
and power-type methods to approximate the signal sub¬
space. Karhunen and Joutsenalo [3] approximated the
signal subspace using the discrete Fourier and Cosine
transforms. Ermolaev and Gershman [4] used powers
of sample covariance matrix based on Krylov subspaces
to approximate the noise subspace when the number of
impinging signals and a threshold which separates the
signal and noise eigenvalues are known o priori. In this
work, we assume that a rough estimate of a threshold
is known. For useful articles and books, the reader is
referred to [5], [6]-[8] and the references therein.
The proposed algorithms could prove useful if a
threshold that separates noise and signal eigenvalues is
known. This threshold can, in some cases, be obtained
by tracking subspaces where largest eigenvalue of cur¬
rent noise subspace or smallest eigenvalues of current
signal subspace of the power level of the noise floor are
known. In these cases the proposed algorithm can help
speed up the computation for final estimation of sub¬
spaces. Another application is when the rank of signal
subspace is known.
2. Data Model
The N samples of a scalar valued signal y{n) are as¬
sumed to be the sum of M complex-valued sinusoids in
additive zero mean white Gauassian noise
xk(n) = akei<Wkn+M, k = 1, 2, • • • , M,
M (i)
y(n) = y^ xk{n) + v(n), n = 1, 2, • • • ,N,
k= 1
Here ak > 0 is the amplitude and the frequencies
wi , • • • , wm are assumed to be distinct parameters, and
the phases <pk are assumed to be uniformly distributed
on [0, 27t] and are mutually independent. The noise,
v(n) is assumed to be independent of the phases and to
satisfy
E{v(n)v*(n — k)} — ol6(k), (2)
where (.)* denotes complex conjugate and <5(.) is the
Kronecker delta function. A low rank matrix represen¬
tation of the problem is obtained by collecting L > M
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
127
received samples in a column vector
y(n) = [y(n) y(n+ 1) ••• y(n + L-l)]T, (3a)
where (.)T denotes the matrix transpose.
The notation x(n) will denote the vector
x(n) = [xi(n) xi(n+\) ■■■ xm{ti + L-T)]t .
(36)
Hence y(n) can be written as
y(n) = V(w)x(n) + v(rc), n = 1, • ■ ■ ,7V - L + 1, (4)
where the additive noise vector, v(n), is defined simi¬
larly to y(n) in (3) and V(w) is an L x M Vandermonde
matrix given by
1 1 1
ejW! ejw 2 , . , g jwM
ej(L-l)wi ej(L-l)w2 ... ej(L-l)wM
(5)
The argument w is omitted in the sequel when not
required. The covariance matrix, R, of the received win¬
dowed sequence is
Ry = £{y(n)y‘(n)} = VDV* + a2vIL, (6)
where the covariance matrix D = diag(a i,---,Qm) is
diagonal. The matrix II is the identity matrix of size
L. Note that
Rx = £{x(n)x*(n)} = VDV*. • (7)
Similar formulation can also be obtained for the direc¬
tion of arrival (DOA) problem except in that case the
matrix D is not necessarly diagonal.
In this paper, it is shown that if a threshold that
separates the signal and noise eigenvalues or if the di¬
mension of the signal subspace is a priori known, the
subspace estimation can be obtained using the QR fac¬
torization of a large power of the covariance matrix.
3. Invariant Subspace Computation
Let A be a Hermitian matrix, and let Po and Pi to de¬
note the orthogonal projections onto the invariant sub¬
spaces corresponding to eigenvalues inside and outside
the interval (— 16|, |6|), where b is a nonzero. An elegant
method for computing those invariant subspaces is pre¬
sented next. Consider the sequence of matrices defined
by
Sk = (bkIL-Ak)(bklL + Ak)~\ (8)
fok _
then the eigenvalues of Sk given by {bk+xl }f=1 con-
j
verge to 1 or -1 as k — » oo. Thus Sk is bounded for
all sufficiently large k. It can be shown the sequence
Sk converges to a matrix S satisfying ,5 2 = IL, and
SA = AS. Moreover, S and A have the the same invari¬
ant subspaces inside and outside a circle of radius |6| and
centered at the origin. If (8) is computed directly using
powers of the matrix A, over- and under-flow will occur.
Since the sample covariance matrix is generally positive
semidefinite, we will apply this iteration on the shifted
matrix. Fast implementation of computing the limit of
the sequence which also avoids the problem of
over- and under-flow will be given next.
Algorithm 1:
So = Rv — blL
Sk+ 1 =
{(/l + Sk)r - ( IL - Sk)r}{(IL + Sk)r + (II - Sk)r}-\
(9)
It can be shown that Sk satisfies the following elegant
error formula
(sfc+1 + - s) = {(sfe + s)-\sk - s)r
= {(So + S)-1(S0-S)}rfc.
(10)
This method can be made to converge at any desired
rate by choosing an appropriate r. From several nu¬
merical experiments, it was observed that for r = 2, a
suitable K — 5, while K = 3 if r = 3. Once the desired
convergence is obtained, the signal subspace projection
is computed as Ps = and the noise subspace
projection is approximated as P„ = 2K '■ 1 .
The next results provide quadratically convergent
methods for subspace computation. The significance
of the next theorem is that it computes the projection
matrix for the subspaces whose eigenvalues fall between
two numbers a and b.
Theorem 1. Let Xo — Rv be a L x L nonsingular
matrix and let 0 < a < b be two positive numbers. Let
Xk be generated using
Xk+1 = (2Xk - (a + b)IL)~\Xl - (a + b)Xk + abIL),
(Ha)
where II is thepxp identity matrix. Then Xk converges
quadratically to S — aQ\ + bQ2, where Qi and C}2 are
the projections onto the span of all eigenvectors of Rv
whose corresponding eigenvalues are in the right and
left half planes of the line which perpendicularly bisects
the segment between a and b. Moreover, Q i = b/j)L_~5 ,
Q2 ~ "n aj}d AT satisfies the following error formula
(Xk+1 +S)~1(Xk+1-S) = (Xfc+Sr^Xfc-S)2. (116)
It should be stated that the above result holds true
for any two numbers a ^ b. In this case if a + b = 0
with a 7^ 0, then the subspace decomposition reduces to
computing the projections onto the subspaces spanned
by the eigenvectors with eigenvalues having positive and
negative real parts, respectively. Specifically, if a —
—b = 1, the matrix S reduces to the matrix sign function
of Xo.
When a threshold b which separates the signal and
noise eigenvalues is a priori known, then the suggested
approach will be very effective in extracting the signal
and noise subspaces. More generally, one can derived a
128
stable and quadratically convergent algorithm for com¬
puting the invariant subspace of the matrix A in the
half-plane with boundary determined by the line which
perpendicularly bisects the line segment between 2 = 0
and 2 = 2b.
Theorem 2. Let A be a nonsingular matrix of size L
and let b Y 0 be a complex number. For k = 1, 2, • • •,
compute
Zk+ 1 = \zk(Zk - blhyxzk, (12a)
with Z\ = A. Then the sequence Zk converges to 2b Z
where Z is the projection onto the subspace spanned
by all eigenvectors whose eigenvalues are in the right
half plane with boundary determined by the line which
perpendicularly bisects the line segment between 2 = 0
and 2 = 2b.
The quadratic convergence of this algorithm can be
seen from the error formula which can be shown to be
(Zk+1 - 2bZ)Zk+1 = {( Zk - 2 bZ)Zkl}\ (12 b)
Note that the matrix inverse in (2a) can be avoided by
utilizing the Schultz iteration [9].
The main disadvantage of (9) and (12) is that they
require the computation of matrix inverse. In the follow¬
ing result an implementation of (9) which avoids matrix
inverse computation is given.
Theorem 3. Let b be a threshold which separate the
signal and noise eigenvalues of the positive definite ma¬
trix Ry. Let Sk be a sequence generated as follows:
So — Ry — blL
(lL + Sk)r _ Qu Ql2 Rk
( II — Sky Q21 Q22 0
k = 0, 1, 2, • • • , K
Sk+ 1 = (Qll + Q2l)(Qll — Q21))
then Sk converges to Px< |t,| — Ta>[6[-
(13)
Note that the middle step in Equation (13) involves
QR decomposition. This provides an rth order conver¬
gent algorithm for computing the projections onto in¬
variant subspaces to the left and right of the line 2 = 6.
Once S is computed accurately, then the eigen-spaces
can be obtained from the QR factorization of Ih~S , i.e.,
= QR, then Q*(RV -bIL)Q = ^ ^ .where
all eigenvalues of A\ are inside the interval [ — 16|, \b\]
and those of A 2 are outside that interval. This process
can be repeated if necessary on smaller matrices A\ and
Ai- Initial tests of this algorithm have shown that this
implementation is stable and convergent even when the
matrix A has an eigenvalue as small in magnitude as
lO-13.
We should note that in Iteration (13), orthogonal
projections are obtained using only matrix multiplica¬
tion and the QR factorization. This method can be
made to converge at any desired rate by choosing an
appropriate r.
Algorithm 2:
Using analogous derivation, we obtained another
inverse- free implementation of (13) for Hermitian ma¬
trices which is given as follows:
P0 = Ry ~ blL
Pk _ Qu Ql2 R)
{ip-PkY\ - [Q21 Q22J 0
Pk+ 1 = Qu(Qu — Q21 ) .
k = 0,1,2, ■■■, K
then Pk converges to an orthogonal projection. Let
Pfc+i = QR be a QR factorization, then Q*AQ is block
diagonal. This algorithm indicates that projections onto
half-planes can be obtained using only matrix multipli¬
cation and QR factorization.
4. Estimation of a Threshold
The performance of estimators based on the approxima¬
tions given in the previous section is mainly dependent
on the accuracy of a threshold that separates the signal
and noise eigenvalues or if the dimension of the signal
subspace is a priori known.
Since Ry is Hermitian, it has the eigendecomposi-
tion Ry = Y^i= 1 uiui where A, and are the ith
eigenvalue and 1th corresponding eigenvector. For con¬
venience, it is assumed that the eigenvalues are sorted
in decreasing order so that Ai > Aa ■ • ■ > Am > Am+i =
• • • = \l = 0% with corresponding eigenvectors {u;}f=] .
The eigenvectors {uijfLx are usually called the signal
vectors and the eigenvectors {wj}feM+i are called the
noise vectors. If the average of the signal eigenvalues
is denoted by As, then one can show that trace^Rvl js a
good estimate of the threshold provided that L is suffi¬
ciently large. The main requirement for this threshold
is Oy < tracARQ < \M which holds provided
M . , 2\ r
7 (Am 0"u) ^ As
Note that in this inequality the only parameter that can
be varied is L. Clearly, if L is much larger than M so
that » 1, then the above inequality will hold.
Although this threshold is very simple to compute, it
holds only for the theoretical covariance matrix, i.e., all
noise eigenvalues are the same. Another observation
is that (15) holds for smaller L if the spread of signal
eigenvalues is small and thus the difference As — Am is
small, or if Am — o’3 is large. Both of these cases lead
to smaller L for (15) to hold.
Note that for M = 2, (15) reduces to
— - — (A2 — <?v) > Ai — A2.
Also in the hypothetical case in which all signal eigen¬
values are equal the above threshold always accurate for
any L > M.
129
When \s—Xm is large, one can use a sharper estimate
of the threshold based on
. _Ef=1VA ;_T
M ~ l ~ L'
This estimate can be computed from the covariance ma¬
trix but the computation is very lengthy and compli¬
cated even when L is low. For example, when L = 2
the value of p can be estimated from
T2 = trace(Ry) + det(Rv),
where T = pL. For L = 3, T can be estimated by
solving the following equation
{(T2 - a)2 - 46}2 = 8 VcT,
where a, b, c are determined from the characteristic poly¬
nomial of Rv given by A3 — a\2 + b\ — c.
5. Simulation Results
In this section, frequency estimators based on subspace
approximations are examined on several data sets gen¬
erated by the equation
y(n) = d1eji2”f'n+M + d2ei{2”hn+M + v(n), (15)
where di = 1.0, d2 = 1.0, /i = 0.5, f2 = 0.52 and
n = 1,2, •■■,1V = 25. The fa are independent ran¬
dom variables uniformly distributed over the interval
[ — 7r, vr] . The noise v(k) is assumed to be white and
uncorrelated with the signal. Note that f2 — fi < jj-
2
The SNR for either sinusoids is defined as 101og10(^$),
where x{n) = die^2nfin+<tl1^ 4. d2ej( 2wf2n+M ancj a 2 ,
cr2 are the variances of a;(n) and v(n), respectively. The
size of the covariance matrix is chosen to be L = 10
which in the absence of noise has effective rank two. We
performed experiments to compare the proposed meth¬
ods versus the truncated SVD-based MUSIC. The SVD
routine on MATLAB is used for the computation of the
signal subspace eigenvectors and eigenvalues required
to implement a SVD-based method for comparison. We
varied SNR from 10 to 20 in 5dB steps and estimated
the frequencies for data length 25. For each experiment
(with data length and SNR fixed), we performed 100
independent trials to estimate the frequencies. We use
the following performance criterion (RMSE)
yy ^ '(/* ftrue)2
i=l
to compare the results. Here Ne is the number of in¬
dependent realizations, and /, is the estimate provided
from the it h realization. Several experiments were con¬
ducted to test the performance of the algorithms pre¬
sented in Theorem 3, and the SVD-based MUSIC. The
mean values of estimated frequencies and their RMSE
of the SVD-based MUSIC are given in Table 1.
SNR
/1
fi
RMSEh
RMSEh
20 dB
0.500556
0.522322
0.00563
0.012522
15 dB
0.500729
0.521735
0.00652
0.014531
10 dB
0.500961
0.524952
0.00813
0.019204
Table 1: Mean and RMSE of frequencies for data of
two complex sinusoids at frequencies 0.50 and 0.52
in noise with SNR=20, 15, 10 dB, dimension of data
vectors L=10. Theorem 3 is used.
References
[1] Kay S. M. and Shaw A. K., ’’Frequency Estima¬
tion by Principal Component AR Spectral Estimation
Method without Eigendecomposition,” IEEE Trans,
on Acoustics, Speech, and Signal Processing, Vol. 36,
No. 1, pp. 95-101, January 1988.
[2] Tufts D. and Melissinos C. D., ’’Simple, Effective
Computation of Principal Eigenvectors and Their
Eigenvalues and Application to High-Resolution Es¬
timation of frequencies,” IEEE Trans, on Acoustics,
Speech, and Signal Processing, Vol. 34, No. 5, pp.
1046-1053, October 1986.
[3] Karhunen J. T., and Joutsenalo J., ’’Sinusoidal Fre¬
quency Estimation by signal subspace Approxima¬
tion,” IEEE Trans, on Acoustics, Speech, and Sig¬
nal Processing, Vol. ASSP-40, No. 12, pp. 2961-2972,
December 1992.
[4] Ermolaev V. T. and Gershman A. B., ’’Fast Algo¬
rithm for Minimum-Norm Direction-of-Arrival Esti¬
mation,” IEEE Trans, on Signal Processing, Vol. 42,
No. 9, pp. 2389-2394, September 1994.
[5] Kay S. M., Modern Spectral Estimation, Theory and
Applications, Englewood Cliff s, NJ: Prentice-Hall,
1988.
[6] Hasan M. A., and Hasan A. A., ’’Hankel Matrices of
Finite Rank with Applications to Signal Processing
and Polynomials”, J. of Math. Anal, and Appls, 208,
pp. 218- 242, 1997.
[7] Hasan M. A., Azimi-Sadjadi M. R., ’’Separation of
Multiple Time Delays Using New Spectral Estimation
Schemes with Applications to Underwater Target De¬
tection”, IEEE Trans, on Signal Processing, Vol. 46,
No. 6, pp. 1580-1590, June 1998.
[8] Hasan M. A., ”DOA and Frequency Estimation Us¬
ing Fast Sub-Space Algorithms,” accepted for publi¬
cation in Journal of Signal Processing.
[9] Stoer J. and Bulirsch, Introduction to Numerical
Analysis, Springer- Verlag, New York 1980.
RMSE -
\
130
STOCHASTIC ALGORITHMS FOR MARGINAL MAP RETRIEVAL OF SINUSOIDS IN
NON-GAUSSIAN NOISE
Christophe Andrieu - Arnaud Doucet
Signal Processing Group, University of Cambridge
Department of Engineering, Trumpington Street
CB2 1PZ Cambridge, UK
Email: ca22 6@eng.cam.ac.uk - ad2@eng.cam.ac.uk
ABSTRACT
In this paper we propose a method to estimate the frequencies of
sinusoids embedded in non-Gaussian noise. We model the noise
using mixtures of Gaussians and propose two original, efficient
algorithms that allow for the marginal MAP estimation of the si¬
nusoid parameters to be estimated. Outline of the proof of con¬
vergence of the algorithms is also given and simulation results are
presented.
1 Introduction
The harmonic retrieval problem is a fundamental problem in sig¬
nal processing that has numerous applications in radar, seismology
and nuclear magnetic resonance. Many efforts have been devoted
to the development of methods that address this problem, ranging
from periodogram related procedures, to subspace and parametric
methods relying on maximum likelihood or Bayesian estimation.
The Bayesian estimation of harmonic signals in white Gaussian
noise has been the subject of many recent papers, see [1], [2], [4],
[5], among others. Here we address the important and more diffi¬
cult problem of estimating the frequencies of sinusoids embedded
in non-Gaussian noise, and formulate it in a Bayesian framework.
A commonly used tool to model non-Gaussian distributions con¬
sists of using discrete or continuous mixtures of Gaussian distri¬
butions, and this is the approach adopted here. The motivation for
this choice is that by introducing a proper set of (artificial) missing
data, say £, one can often design simple and efficient algorithms
that allow for the estimation of important features of the poste¬
rior distribution related to the problem. However, from a statistical
point of view the introduction of missing data can typically lead
to inconsistent estimators as the number of parameters to be es¬
timated typically grows with the number of observations. Joint
estimators, i.e. estimators involving £, should thus be avoided and
marginal estimation of the parameters should be favoured.
For the case of sinusoids embedded in a noise modelled as
a mixture of Gaussians, the analytical expression of the marginal
posterior distribution of interest is of the form
p(a,w,6|y) = J p(a,u,S,Z\y)d£,
where a, w are the amplitude and pulses of the sinusoids and S
are parameters of the observations noise. Unfortunately it is not
C. Andrieu is sponsored by AT&T Laboratories, Cambridge UK. A.
Doucet is sponsored by EPSRC, UK.
available in closed-form and one has to resort to numerical meth¬
ods. Monte Carlo methods, and in particular Monte Carlo Markov
chain methods (MCMC) have proved to be efficient tools for the
estimation of certain features of complicated posterior distribu¬
tions, in particular MMSE (Minimum Mean Square Error) e.g.
E ( (a, w, (5)| y) in the case treated here.
However this choice of estimator is not adapted when the mar¬
ginal posterior distribution is multimodal and the MMSE estimate
located between the modes, possibly in a region of very low prob¬
ability. Computing MAP (Maximum a posteriori) estimates of the
frequencies might be preferable in such cases, but whereas MCMC
methods are well adapted to the estimation of marginal posterior
means, their use to perform MMAP (Marginal MAP) estimation
can be questionable. Indeed in this case further approximations
are introduced by histogram or density estimation methods and re¬
quire careful tuning of extra parameters.
The EM (Expectation Maximization) algorithm is designed to
converge towards a stationary point of the marginal posterior dis¬
tribution. It is however limited to certain classes of models for
which the expectation and maximization steps can be performed
conveniently. This is why stochastic versions have been proposed,
such as SEM (Stochastic EM) or MCEM (Monte Carlo EM). Con¬
vergence results are sparse and the algorithms do not always fully
exploit the structure of the statistical model. In this paper we pro¬
pose several Monte Carlo methods for performing MMAP of the
frequencies of sinusoids embedded in non-Gaussian noise. The
first method relies on the SAME (State Augmentation for Marginal
Estimation) algorithm [10]. This algorithm is conceptually very
simple and straightforward to implement in most cases, requir¬
ing only small modifications to MCMC code written for sampling
from p ( a, w, S, £ | y). In order to reduce the computational com¬
plexity of this algorithm, we present a stochastic approximation
type extension of this algorithm. We then present an original anal¬
ysis of the convergence of the stochastic approximation type algo¬
rithm which relies on a perturbation analysis of the original SAME
algorithm. Simulation results are presented that demonstrate the
interest of the approach.
This paper is organized as follows. In Section 2 the signal
models are given. In Section 3, we formalize the Bayesian model
and specify the prior distributions. Section 4 is devoted to Bayesian
computation. We propose non homogeneous MCMC algorithms
to perform Bayesian inference for which sufficient condition for
global convergence can be established. Performance of these algo¬
rithms is illustrated by computer simulations on synthetic data in
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
131
Section 6.
3.1 Prior distribution
2 Problem statement
Let y = (t/i, t/2, . . . , yT)T be an observed vector of T real data
samples. The elements of y are the superimposition of k sinusoids
corrupted by noise n = (m , . . . , nr) :
k
Vt = E** cos (oJjt) + a3j sin ( u>jt ) + nt,
j= t
where 1 < k < [(T — 1) /2J, aCj , aSj and u)j are respectively
the amplitudes and the radial frequency of the jth sinusoid. We
assume that w e fi = 6 (0, n)k \uj j1 / ui]2 for ji / j2}.
In a vector-matrix form, we have
y = D (w) a + n,
We set a prior distribution on the unknown parameter vector 0 =
(a, w, A, ri:T, a, cr2) 6 © where © = R2k x 12 x (0, 1) x
{0, 1}T x (0,1) x R+. The following uninformative improper
prior distribution2 is selected:
p (a,ui,cr2| ri:T,a) oc
dt Mjrjgjg)
1/2
In (w) .
This prior corresponds to Jeffreys’ prior for the linear model [3], It
penalizes close frequencies as pointed out in [5]. The parameters
q and A are assumed distributed according to a ~ W(0,t) and
A ~ W(0,i) which are vague prior distributions.
3.2 Estimation objectives
Given the observations y, Bayesian inference about 0 is based on
the posterior distribution p ( 6\ y) obtained from Bayes’ theorem,
where [a]2i_u = oc., [a]2il = aSj and [at];>1 = w, fori =
1, . . . , k. The T x2k matrix D (u>) is defined as [D (w)]( 2j_, =
cos [wjt] and [D (w)]( = sin [tot] for t = 1, . . . ,T, and j =
1, . . . , k. The noise is assumed white, distributed according to a
mixture of Gaussian distributions, i.e.1
nt ~ AA r (0, a2) + (1 - A) Af (0, aa2) ,
where 0 < A < 1 defines the mixture probability, cr2 is a global
scale parameter and 0 < a < 1. It is convenient to introduce the
so-called missing data ri,T such that:
I ft ( 0,<t2I{i} (rt) + a<T2I{0> ( rt )) ,
and Pr (rt = 1) = A and Pr (rt = 0) = 1 — A. This allows us to
write the likelihood of the observations
p (y|a,u>, A,ri:r,a,<r2) = |27ra2Sr1/2
X exp (- (y - D (w) a)T (y — D (w) a)) ,
where S = diag (l{1} (rj) + al{0} (rj)) j = 1, . . . ,T. Note
that this likelihood is invariant by permutation of the indexes of
the pulses uij, if no ordering constraint is introduced, and that con¬
sequently MMSE estimates can lead to very poor results. The pa¬
rameters of the sinusoids, of the noise and the missing data i.e.
0 = (a,u», A,ri;r , q,(T2) are unknown, and our aim is to esti¬
mate these parameters; a and w being in general the parameters
of primary interest. Note that the strategy developed in this paper
can be extended to the case of continuous Gaussian mixtures, in
order to model heavy tailed distributions, but we do not consider
this case here.
3 Bayesian Models and Estimation Objectives
In this paper we follow a Bayesian approach where the unknown
parameter vector 0 is regarded as being drawn from an appropri¬
ate prior distribution. This prior distribution reflects our degree of
belief in the relevant values of the parameters. Note that when no
prior knowledge is available, then uninformative distributions can
be used [3], This is the approach we follow here. We first propose
a model that sets up a probability distribution over the space of
possible structures of the signal and we give the estimation aims.
1This could be extended to the case of discrete mixtures with more
components.
p(0|y) °cp(y|0)p(0).
Our aim is to estimate this joint distribution from which, by stan¬
dard probability marginalization and transformation techniques,
one can “theoretically” obtain all posterior features of interest in¬
cluding the marginal distributions, posterior modes or conditional
expectations such as the MMSE estimate
E[0|y]= f 0P(0\y)d0>
J®
among others. As discussed in the introduction this problem can be
addressed using MCMC methods but the use of these techniques
for the computation of the MMAP estimator (a, w,er2, &)MMAP
defined as
argmax p (a,u;,<T2,a| y) ,
(a,w,cr2,a)eR2fcxnxR + X(0,l)
can be questionable. In the next section we describe an algorithm
that allows for computation to be performed by adapting MCMC
techniques for MMAP estimation.
4 Bayesian Marginal MAP robust spectral estimation
4.1 The SAME algorithm
One might be interested in the marginal MAP estimation of the fre¬
quencies, i.e. finding the maximum ofp (a, w,cr2)a:| y). In order
to achieve this we introduce two versions of the SAME algorithm
[10], the second one being a stochastic approximation type algo¬
rithm. Let us consider the extended probabilistic model,
p®7 (a,w,<72,a,Ai:7!r1;T,i:7|y)
« nj=i P ( y I a, w, cr2 , a, Aj , n ;T,j ) p (a, u>, a2 , a, Aj , r i .T,j ) ,
where 7 is a positive integer, r^rj is a replica of the missing
data. Clearly this probabilistic model admits p7 ( a,w,cr2,a| y)
as marginal distribution, where p7 (a, o>,<r2,a| y) is the distribu¬
tion proportional top7 (a, w,cr2,a|y). Given a sequence (7i)ieN
such that lim ji = +00, the idea of the SAME algorithm is to
i— >4-00
run a non homogeneous Markov chain that admits
p®7i ( a, w, (T2, a | y) as invariant distribution at each iteration i.
2 A prior distribution p (ff) is said to be improper if f@ p (9) d9 =
+00.
132
The distribution p®7i (a, w,cr2,a| y) concentrates itself on its
set of global maxima as i — > +oo (this is the idea of simulated
annealing) and the algorithm is thus hoped in practice to converge
towards a global maximum. Note that when 7i = 1 for i > 1 this
algorithm is a standard MCMC that asymptotically produces sam¬
ples from p ( 6\ y). In practice one can make use of the properties
of the model and analytically integrate out a, a2 and A; , leading to
an expression ofp®7 (w,a,ri:T,i:7|y) up to a constant. It can be
shown that
p®7 (w)a,r11T,1.iiy) oc nu lDTs7lDl1/2 Is; r 1/2
x |M7r1/,z [yTP7 {u)y]-^T,2+k)+k
x IlJ=i ni.J • — bij)!i
where mj = Ef=i I{o> (rt,j) and
m7 =
m7 = M7Dt («) 9? = £?=1 S."1.
p7 (W) = vf-71 - (w) m7dt (w)
In order to sample from p®7 (w,a,ri:r,i:7| y), we propose the
following algorithm:
4.2 TheSA2ME
In the current version of the SAME algorithm 7 \ replicas of the
variables ti-.t are sampled at each iteration i, which can rapidly
become cumbersome as 7, becomes large. Let to be an itera¬
tion chosen by the user. Then we propose from iteration to not
to resample the variables ri;T,i:7i_1 that are “frozen” once they
are simulated but simply sample the new replicas ri:T,7;_1-t-iT7i-
The computational gain of this SA2ME (Stochastic Approximation
SAME) is obvious and the analogy with classical stochastic ap¬
proximation algorithms is clear, although we here take advantage
of the statistical structure of the problem. However the algorithm
is no longer a Markov chain as the update of the parameters at it¬
eration i depends on the past of the chain up to iteration i — 1. In
fact this new algorithm can be viewed as a perturbation of the orig¬
inal SAME algorithm, and an analysis of these perturbations can
be carried out to prove the validity of the new scheme, as sketched
in the next section.
5 Convergence analysis
We first point out a convergence result for the SAME algorithm
and then focus on the SA2ME algorithm.
MCMC algorithm for marginal spectral analysis
1. Initialization 0(o) = jw(0\a(0\r^il!7o } and i = 1.
2. Iteration*
• For j = 1, ...,7; .sample
rt(‘j from p®7i (n,j|y,«{i'
-!) a r^"1)
’ Tnt,j 9
forf = 1, ..., T.
Sample a(<) ~jp®7i
a«-D
,*2(<
Sample wj,) ~ p®7i |
y,w^_1),a(:
°.r«
j = 1, ..., k with an MCMC
step.
Sample a(i),(r2(i) ~p®7i (
\ <72|y,w(l
r(i)
Where rn means “n;T with n removed” and similarly for W7.
We comment the different sampling steps:
• Sampling rt,j is straightforward as it simply involves sam¬
pling from a discrete distribution.
• Sampling Uj can be done using an adaptation of the tech¬
nique described in [1].
• Sampling a, a2 is standard as it requires the simulation from
an inverse-Gamma distribution and a normal distribution.
• Sampling a mainly amounts to sampling from a truncated
inverse-Gamma distribution and can be done efficiently by
using a rejection method based on the work of [8].
S.l SAME algorithm
First we set 81 = {a, <r2,a} and 82 = {A,ri;r} and name
their state spaces ©1 and ©2. The SAME algorithm defines a
Markov chain on 8\ , and it can be proved that this Markov chain
is uniformly ergodic for a constant sequence 7 <. i.e. for any prob¬
ability distribution p,
. fim^ \\pK*. (dOi) — p1i (dO !)|| = 0,
at a geometric rate independent of the initial condition, where || ||
is the total variation norm. Here Ki is the transition kernel of the
SAME algorithm at iteration i which can be formaly written as
Ki (8tl)Ai}) oc f&i P (*i°| ^1,7l)) njLiP (d8?\ *{°) •
This convergence result mainly relies on the fact that the parame¬
ters 81 and 82 lie in bounded sets. From this result and following
arguments similar to that used to prove the convergence of sim¬
ulated annealing, it can be shown that for a logarithmic series of
7i the SAME algorithm for MMAP estimation converges in the
following sense
lim \\pK1K2 ■ ■ ■ K„ (dOi) — p7” (<Z0i)|| = 0.
n— >+00
Furthermore as the sequence p1' (dOi) tends to a mixture of delta
functions located at the global maxima of p (81) we conclude that
the algorithm will asymptotically provide us with an estimate of
9i,mmap — arg max p(0 1).
9l£©l
This elegant algorithm allows to sample from the series of dis¬
tributions of interest and convergence results can be proved that
support the validity of the approach (See Section 5). However
we see that as 7; approaches infinity the computational burden of
the algorithm becomes rapidly unrealistic. Thus we propose here
a stochastic approximation adaptation of the algorithm presented
above, which is computationally much cheaper.
5.2 SA2ME
The proof of convergence of the algorithm relies on an analysis of
the perturbations introduced by the new scheme upon the original
SAME algorithm. We sketch here the proof of the algorithm, out¬
line the main propositions that lead to the convergence result and
explain their intuitive meaning. We introduce some notation that
133
will be useful throughout the proof. We introduce the transition
probability corresponding to the SA3ME algorithm
Ki+i ^1:7i), e[i]-,depi+lni+1\ d9\i+1)^j
oc p ^d^i+1)| P (47i+1:7i+l)| ,
Here we simply express the fact that the missing data 6$,1:7i) are
“frozen” once they are simulated. In order to study the conver¬
gence properties of the second algorithm it will be useful to intro¬
duce for some integer k the transition kernel of the algorithm for
which only the missing data up to iteration k — 1, 6\' , are
frozen, and missing data from then on, 7,+1)j are sam_
pled at each iteration. More precisely, for i > k we define
In order to study the convergence properties of our algorithm, we
will need notation to combine these kernels, namely,
pKun (d9[n\d9iyn-1 + 1:ynA = fBnx9ln-i P (dC)
x^i (e(°);de[1),de(21)^ k2 (e^\e^-,def\d8fn2)^
... x Kj (e[i-1\ei1:v-1^,de[j\de^i-'+u',iA x ...
... xkn ^n-1),^1:7n-^;d0("),<i^7n-,+1:7n)^ ,
and for k,j>m
pKuk-iKk:n,m {det\de^) = /e?xe,m_. p (de[0))
xKi (0[o);d0[1)d$;1)) k2 (i9[1),^1);d^2),d^2:72)j ...
/glm-lm-l ■■■/eji-'m-l
... x kj<m ^e[j-1\eilnm-1);de{i\d6^m-i+1:ii^
... X Kn,m ^-1)41^-1);d^)1d4^-1+1:7”)) .
Now that notation is defined we can express the main result of
this section. We want to study the asymptotic properties of the
difference of the two stochastic processes, more precisely we want
to prove that under certain conditions for any probabilities v and p
limn-j+oo pFfl:n|| — 0.
A trivial decomposition and the application of the triangle inequal¬
ity leads to
|'zA'l;n pkl:n || < \\vKl,n - pK1:n\] + ||/i/fi:n ~/t.Kj:„|| .
From the result of the previous subsection, the SAME algorithm is
ergodic and thus the first term goes to zero as n -4 +oo. Conse¬
quently we focus on the second term.
Our results are based upon a decomposition into an estimation
error and an approximation bias, which we now state:
Proposition 1 For all integers m„, and n such that mn < n, we
have the estimate
pKl:n j| ^ :n pH 1 :mn ^mn+l:n,mn ||
“b 2 ~2k=mn+l — — pA”l:k|-
Proof. For mn < n we have the telescoping sum
-,n flK l;n — fJ,K i:n ftKl \mn + l:n,mn
^fc=7nn +1 — 1 ^k:n,mn fJ'^lik^k+l :n,mn >
with the convention Kn+\-.n,m„ = Id. Then by first applying the
triangle inequality and the fact that for any probability measures
p and v the following statement holds - vKk<rrin jj <
[|p — v || we obtain the result. I
Proposition 2 There exists a sequence mn such that
\\pK1:n pK\:mn Hmn+l:n,mn || —0.
Intuitively, during the mn first iterations K i:m„ introduces an ap¬
proximation error compared to the SAME algorithm, which is then
corrected in the following n— mn+l iterations with Kmn+i:n,mn ■
Then if m„ increases significantly less fast than n such that
Kmn+i:n,mn can correct and forget in n— mn + l iterations the er¬
ror generated during the mn first iterations, then the result should
hold.
Proposition 3 There exists a sequence mn such that
n
lim \\pKuk-lkk,mn - pkl:k = 0.
fc=mn +1
This result relies on the fact that for term k in the sum, the two dy¬
namics are the same up to time k — 1 and simply differ at iteration
k where, on one hand, the O*'™" ,fc^are “rejuvenated” with Kk,mn
and on the other hand only 6 ^ is sampled with Kk . When 9i and
62 lie in bounded spaces one can bound the error introduced, and
show that there exists 0 < 0 < 1 such that for mn = n — n0 the
sum of these errors goes to zero as n — > +00.
By combining the three propositions and using the conver¬
gence result proved for the SAME algorithm we can deduce the
following result:
Theorem 4 There exist sequences mn and 7„ such that for any
P-
lim lip7" - /lift ;n II = 0,
n— b-foo || (I
which proves the validity of the SA2ME algorithm under suitable
conditions. Note that these results rely on a boundedness assump¬
tion on the parameters. We are currently extending these results to
more general cases for other problems.
134
6 Simulation results
We applied the two algorithms described above for the following
parameters: T — 64 and k — 2. We define Ei = a2. + a^. . E\ —
20, E2 = 6.32, - arctan (asl/aCl) = 0, - arctan ( aaj aC2) =
ixj 4, a>i/ 2n = 0.2 and w2/27r = 0.3. The SNR is defined
as 10 log , o E\/ (2 (T2) and equal to ldB. Theoretically, the algo¬
rithms require a so-called logarithmic cooling schedule 7; and an
infinite number of iterations to converge. This sequence goes to
+00 too slowly to be used practically. We run here the algorithms
for 500 iterations and select a linear growing cooling schedule
7; = A + Bi where 70 = 1 and 7500 = 102. We used the
same series 7 , for the second algorithm and set to = 20. Note the
slower convergence of the second algorithm compared with the
first one, as expected.
Figure 1 : Convergence of the SAME towards the marginal MAP
estimates of the frequencies
Figure 2: Convergence of the SA2ME algorithm towards the
marginal MAP estimates of the frequencies
7 REFERENCES
[1] C. Andrieu and A. Doucet, “Joint Bayesian Detection and
Estimation of Noisy Sinusoids via Reversible Jump MCMC,”
IEEE Trans. Signal Processing, vol. 47, no. 10, pp. 2667-
2676, 1999.
[2] P. Barone, R. Ragona, “Bayesian estimation of parameters of
a damped sinusoidal model by a Markov chain Monte Carlo
method,” IEEE Trans. Sig. Proc., 45 (7) (1997) 1806-1814.
[3] J.M. Bernardo, A.F.M. Smith, Bayesian Theory, Wiley series
in Applied Probability and Statistics, 1994.
[4] G.L. Bretthorst, “Bayesian Spectrum Analysis and Parame¬
ter Estimation,” Lecture Note in Statistics, vol. 48, Springer-
Verlag, New- York, 1988.
[5] P.M. Djuric, H. Li, “Bayesian spectrum estimation of har¬
monic signals,” IEEE Sig. Proc. Letters, 2 (11) (1995) 213-
215.
[6] A. Doucet and C. Andrieu, “Robust Bayesian spectral analy¬
sis using MCMC,” in Proc. EUSIPCO’98, Island of Rhodes,
Sept. 1998.
[7] E.T. Jaynes, “Bayesian Spectrum and Chirp Analysis,” in
Maximum Entropy and Bayesian Spectral Analysis and Es¬
timation Problems, Ed. D. Reidel, Dordrecht-Holland, 1987,
1-37.
[8] A. Philippe, “Simulation of right and left truncated gamma
distributions by mixtures,” Statistics and Computing, 1,
(1997), 173-181.
[9] D.C. Rife, R.R. Boorstyn, “Multiple-tone parameter estima¬
tion from discrete-time observations,” Bell Syst. Tech. J., 55
(1976) 1389-1410.
[10] C.P. Robert, A. Doucet and S.J. Godsill, “Marginal Max¬
imum A Posteriori Estimation using MCMC,” Proc. IEEE
ICASSP’99.
[Ill P. Stoica, R.L. Moses, B. Friedlander, T. Soderstrom, “Max¬
imum likelihood estimation of the parameters of multiple
sinusoids from noisy measurements,” IEEE Trans. Acou.
Speech Sig. Proc., 37 (1989) 378-392.
135
HARMONIC ANALYSIS ASSOCIATED WITH SPATIO-TEMPORAL
TRANSFORMATIONS.
Jean-Pierre Leduc
Washington University in Saint Louis, Department of Mathematics
One Brookings Drive, P.O. Box 1146, Saint Louis, MO 63130
Email: leduc@math.wustl.edu
ABSTRACT
The paper presents new developments in harmonic analy¬
sis associated with the motion transformations embedded
in digital signals. In this context, harmonic analysis pro¬
vides motion analysis with a complete theoretical construc¬
tion of perfectly matching concepts and a related toolbox
leading to fast algorithms. This theory can be built from
only two assumptions: an associative structure for the lo¬
cal motion transformations expressed as Lie group and a
principle of optimality for the global evolution expressed
as a variational extremal. Motion analysis means not only
detection, estimation, interpolation, and tracking but also
propagators motion-compensated filtering, signal decompo¬
sition, and selective reconstruction. The optimality prin¬
ciple defines the trajectory and provides the appropriate
equations of motion, the selective tracking equations, the
selective constants of motion to be tracked, and all the
symmetries to be imposed on the system. The harmonic
analysis provides new special functions, orthogonal bases,
PDE’s, ODE’s and integral transforms. The tools to be de¬
veloped rely on group representations, continuous and dis¬
crete wavelets, the estimation theory (prediction, smooth¬
ing and interpolation) and filtering theory (Kalman filters,
motion-based convolutions, integral transforms). All the
algorithms are supported by fast and parallelizable imple¬
mentations based on the FFT and dynamic programming.
1. INTRODUCTION
In this paper, the harmonic analysis on motion transforma¬
tions is built on the actual kinematics as they take place in
the external space and in the projections on sensor arrays
(Figure 1). Eventually, they are embedded in the signals to
analyze. From that point of view, this approach fundamen¬
tally differs from the motion models currently presented in
the Literature (see in [1] and all the references) which rely
on techniques based on stochastic processes, statistics and
operations research. These are namely block-matching, pel-
recursive and Bayesian techniques. As a major drawback,
these techniques are totally blind to the underlying mathe¬
matical structures of the spatio-temporal transformations.
The author wants to thank Prof. B. Blank in the Math.
Dept, for helpful discussions and Prof. B. K. Ghosh in the SSM
Dept, for his support on numerical computations. This research
work was supported by the AFOSE grant No. F49620-99- 1-0068.
The main point of the approach proposed in this pa¬
per is to bring differential geometry, mechanics on manifold
and harmonic analysis into signal analysis. This theory pro¬
vides the actual kinematics and relies only on two key as¬
sumptions that can be summarized as follows: a Lie group
structure (i.e. an associative law of composition, and an
identity element for the local transformations) and a prin¬
ciple of optimality (for the global evolution). From those
two key points, a complete machinery of theory, analysis
tools and fast algorithms can be constructed in such a nice
way that all the concepts perfectly match to each other.
This paper presents new developments on this important
topic that cover all the kinematics embedded in any spatio-
temporal real and complex signals and apply to video, radar
and sonar.
The construction of Lie group representations (i.e. the
analyzing functions in the signal space) leads naturally to
several important topics. First, this leads to the existence
of continuous wavelet transforms with frames, tight frames
and new discrete wavelets placed along the trajectories which
perform spatio-temporal and motion-based atomic decom¬
positions, expansions, filtering (prediction, smoothing and
interpolation), estimation and motion-selective reconstruc¬
tions. The second topic deals with the characters of the
group representations to define new special functions and
integral transforms (IT) which generalize the Fourier kernel
for the new kinematics of interest. The third topic proceeds
with the adjunction of a principle of optimality based on
Euler-Lagrange equations and define the existence of a tra¬
jectory and a tracking. This gives rise to the Partial Differ¬
ential Equations (PDE) as equations of wavelet and signal
motion and to Ordinary Differential Equations (ODE) for
tracking. Fourth, the Green functions associated with these
PDE’s turn out to be the previous special functions related
to the kinematics. At this stage, we yield a global analysis
structure with the construction of signal propagators, and
motion-compensated filters.
2. GROUP REPRESENTATIONS, WAVELETS
AND CONVOLUTIONS
In their general form, the Lie group representations Tg act¬
ing upon functions ’F 6 L2(Rn x M, dkdui) read
[?,$](£, w) = an/2 ei{“T+U) ^(T^u,)] (1)
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
136
where g is an element of the group G, the L2 normaliz¬
ing factor a"/2 originates from a Radon-Nikodym derivative
and provides unitary representations, er(UT+k'b ) stands for
the character of the subgroup of spatio-temporal transla¬
tions, and w) is the left-group action of g € G in the
dual space. The dual space, also called the phase space,
is the Fourier domain denoted ~ with spatial frequencies
jc 6 Rn and temporal frequency u G E. The parameters
a € IR+\{0}, 6 € Rn, and r € 1 are respectively the scale,
the spatial and temporal translations.
From the group representations, we define the contin¬
uous wavelet transform as the operator Wy, mapping the
function S € H = L2(Rn x E) into functions of g defined
as
and moving at the constant velocity v. The convolution per¬
formed along this displacement (i.e. along the trajectory)
allows the reconstruction of the still signal F{x,t). This
property is in fact a reminiscence of the motion-compensated
filtering developed by the author in [3] which is going to be
generalized in this work by the introduction of IT’s. Even¬
tually, let us move to the Fourier domain and retrieve the
usual condition of admissibility for the Galilean wavelet as
described in [9, 12, 13]. Proceeding with Equation (5) in
the Fourier domain, we obtain
F(k,u>) = F(k,uj) f f |\k (a k , w — k v) |2 fa
Jr J r+\{H} a
(7)
which leads to the usual condition of square-integrability of
[W^S](g)= / S(Z,t)[Tg*](x,t)dnxdt=<S\Tg<f’ >
J R»xR
(2)
This inner product < ,|. > would remain a simple correla¬
tor between [T9'F](», t) and S(x,t) if no further conditions
were imposed on the unitary and irreducible group repre¬
sentations. In fact, to be a continuous wavelet transform,
the mapping must be invertible i.e. that there exists an
operator WIl such that Wff 1 W7y, = Ih- Ih is the identity
operator in the Hilbert space of observation H. This means
that we want to perfectly reconstruct the signal
S(x,t) = f [W*S](g) [Tg*](x,t) dMg) (3)
Jg
d\i is the left-invariant Haar measure calculated on the
group G. The condition to be fulfilled in order to derive
the inverse transform is known since 1964 in the work of
Calderon. Several examples considered in this paper are de¬
fined in [4, 5, 6, 7]. The simplest case is the affine-Galilean
group where the group element is g = {b, r, v, a} where v €
R71 is the velocity vector [6]. The left-group action is given
by (2-[x — b — v(t — r)], t — r) and the representation in
Equation (1) reads [?«,$](£, w) = an '2 ei{-UT+U) *[k J)
with k = ak, ui = uj + k • v. Let us examine the condition
for an invertible transform in the affine-Galilean case with
n = 1 i.e. k,r,i)6l and a € R+ \{0} as follows
the Galilean wavelet in one-dimensional space and time
f f \9(k,u) |2
Jr Jr W2
dk dui = 1
(8)
See references [4, 5, 6, 7] for the properties and applications
of the Galilean wavelets.
The construction of orthonormal bases proceeds by dis¬
cretizing the group parameters into a lattice. The spatio-
temporal lattice i is easily defined as a generalization of
the discretization affine group a = a™ , b = ribb.a™ , v =
nvv*a™, t = nrT* with a, > 1, and b,, v,, t, > 0 for con¬
venience. If we now consider the regular left-composition
g~l(x,t) = , t — r) in the Galilean case, we can
mimic the case of the affine group [6] as follows. Let a, = 2
andT9,4'(a;,t) = ( a~mx - mb , - nvv*(t - nTrt) ,
t — 7i« r„) where we retrieve the well-known orthonormal
bases <5/ m,p,q(x,t) = 2~m,2'S> (2~mx -p,t - g) in L2(R x
R) at p = mb. + nvv.nTr,, q = nTr. with p, q € Z. Tech¬
nically, we have deployed the usual discrete wavelets de¬
fined from the affine group along spatio-temporal transla¬
tions that correspond to the motion trajectories at constant
velocity [3].
3. SPECIAL FUNCTIONS AND INTEGRAL
TRANSFORMS
<F|r9*> (TgV)(X) dX,(g) (4)
which becomes after some easy computations
[[ if f(»)
JrJr+ [JrxR \ P /
(^a *t, $a)
(x-y)- v(t - p)
t~ P
dy dp | ~~F~(5)
LL {F’" <'t' *■ ?-)}( ’ ) ^
where we have let ^(a^f) = \k(— x,— t) and ^ta(x,t) =
a”1'k(|, <). Let us make an important remark about Equa¬
tions (5) and (6): the introduction of a non-conventional
spatio-temporal convolution denoted *v is in fact a convo¬
lution twisted along the Galilean transformation i.e. the
translation in space has a component depending on time
In this section, we proceed one step further on the represen¬
tations and focus on the characters. The integration of the
characters leads to special functions. These special func¬
tions naturally define the kernel of an integral transform.
This procedure can performed for each group of spatio-
temporal transformation. Let us consider an important ex¬
ample known as rotational motion (described in [5]). The
set of parameters is given as G — {<?|<7 = (b,r,6i,a)} where
&i is the angular velocity Oi 6 E. The composition law is
given as g o g = {b + aR(0ir)b , r + r , 9i + #i, aa };
the inverse element reads as p_1 = {— a~1R(0iT)~1b, —
t, —0i, o-1}. The group representations T(p)'I'J (k, ui)
in polar coordinates b — t (r, 6b) and k -4 (k, 6k) with n = 2
read
0? J in ein(8k + 6b) + kT sinM) $(afc, 0k, 6i ft)
0)
137
with x = Ob — Ok+diT and f2 = . The characters of this
representation lead to the special functions (Figure 4)
i f2w
Jn(k ) = ± / ei[Uu+ksinu] du (10)
^ J o
which axe usually NOT Bessel functions except when 9i
takes an integer values. The complexification of u -¥ i y
gives rise to hyperbolic motion instead of circular rotations
along with new special functions as in (10) with instead real
exponential and sinh functions. These special functions can
be also easily obtained by considering 'J' as a Dirac measure
and integrating this measure along the trajectory. This
process corresponds to “mechanics of moving points” and
defines the spectral signatures of objects moving according
to such transformation. The usual way to deduce the ODE
which admits this special function as solution is to calculate
the Laplace-Baltrami differential operator on this group.
Theorems of additivity for these special functions can be
deduced from the composition of the translations. In this
case, it reads
/+oo
•OO
ri) J[t-n2](k r2) dt = «/[n1-n2][fc(ri +r2)]
(11)
Equation 10 leads to “Hankel-like” integral transforms
Wof](k)
poo
= / /,
Jo
(r) Jh(fc r) r” 1 dr (12)
The same procedure and computations can be done on all
the groups dealing with spatio-temporal transformations
defined in [4, 5, 6, 7]. Examples on the Galilean group
[6, 7] proceed with
fR e~iuT dr = 5{yJ + fct)) e«(—0f
on the acceleration group [4] with
fR e~iwT
= es4 eikb eik^r2 «*22’
where 72 € M, on the deformations [8] with
In *-iUT e~ike“1>xdr = ± F (-<£)
where si 6 1 and F() is the usual Gamma function.
4. PRINCIPLE OF OPTIMALITY AND
TRACKING
According to calculus of variations, the motion between
times ti and t2 coincides with the extremal of the func¬
tional J
rt2
= 0 with J= L[q(t), q(t),q, . . . , q t]dt, (13)
Jti
where 5 stands for the variation. The application of the
optimal variational principle in Equation (13) is equivalent
to writing the so-called Euler-Lagrange equation [7]. The
trajectory is then uniquely defined if the initial state q( 0) =
qo of the object is known. At the extremum, denoted by
the subscript », the Euler-Lagrange equation
d dL _ dh_
dt d'qt dq.
(14)
This Euler-Lagrange equation generalizes quite easily and,
moreover, allow us to derive the equation of wavelet motion
that optimizes the action J. If we consider the Galilean case
with one-dimensional space with q(r) = b(r) and q(r) =
b(r) and the inner product 2 as Lagrangian, then (14) be¬
comes
d d < jWf > _ d < > = Q (15)
dr db db
It is convenient to expand the total differential. The condi¬
tions to introduce the operator in the integral are fulfilled.
One solution of this IT is that the kernel be equal to 0.
This gives a PDE on \l>(afc, to — kb) i.e. the motion equation
for the wavelet. In the Fourier domain the PDE operator
is siven by
( bk + u))(
duo
l<L\-h\Ldud
i>dk 1 Li>2 dk dui
and the PDE by Aw, 'f(afc,w — kb) = '5f(ak,u) — kb).
There are many applications out of this procedure which
can be similarly drawn for each spatio-temporal group. Two
are examined below and a third in Section 5.
If we consider a wavelet tuned on parameter gi and
the Dirac measure on parameter g2, the partial differential
operator ^ becomes n(t,lll)a, 4l.fc)W)
(vik+ui)(
d
V2 — Vi dk
)+* V\
1 d t 1 d2 '
(v2 — Ui)2 dk V2 — ui dk2
(17;
and the PDE becomes an ODE i.e. the tracking equation
H(WJ -,k,u) ^((ak}-k(v 2 - Vi)) = $(ofc,-/i:(t)2 - m)).
Let us consider a Galilean Morlet wavelet [6, 7] applied
to a Dirac measure in pure translation motion at constant
velocity. The signal taken as a Dirac measure on a trans¬
lational trajectory is given by S{x,t) = 5[x — vt]. The
Lagrangian $[&, T,v\ka,wa\ —< 'k9|S' > reads after inte¬
grating the inner product, we get <!>[&, r, v\ ko, wo] =
y^27T eliko(l>-bT)] g[- J{(6T-i))2+T2}]e[-tw0r]
g[J{(fcof-*:o&+wo)2-((^-b)(i'T-6)-r)2}]
e(i{(*:ot>-fcot+wo)((ti-6)(fcr-6)-r)}]
(18)
k0 and ojo axe the coordinates of the wavelet shift in the
Fourier domain. The contribution of all the partial deriva¬
tives involved in the Euler-Lagrange equation namely leads
to an ODE in form of a product of F, which is a complex
function of the constant of motion br — b and 2b = —hr,
with the Lagrangian $[6, r, v\ fco , cc^o]
F[br — b, br — 2b] 4>[6, r, v, ko, wo] = 0 (19)
such that F(0, 0) = 0. The ODE vanishes when v = b,i>T —
b = 0, br — 2b = 0 and w 0 = 0, fco / 0. Therefore, we have
verified that the tracking addresses the correct constant of
motion b = br and b = br + | br 2 meaning that the system
can track objects at constant velocity and constant second-
order acceleration. The tracking requires some symmetry
in the wavelet i.e. that the still wavelet must be located in
the plane ui = 0 with fco / 0. These practical results have
algorithmic importance as pictured in [7].
138
5. MOTION-BASED FILTERING
This section extends the concept of velocity filtering origi¬
nally defined by Fleet and Jepson in [1], studied by Dubois
in [2] for all the categories of motion within the approach
pursued in the previous section. To reach that goal, we
introduce integral transforms whose kernels are motion-
specific Green functions. In the following, it is demon¬
strated that the motion-specific Green functions can be
equivalently derived from the characters of the group repre¬
sentations in Section (3) or from the fundamental solution of
the PDE of the wave equation of Section (4). This leads to
convolutional integral transforms twisted along the motion
transformations as presented in Section (2). The interest¬
ing point of this approach comes from the Equations of the
wavelet motion themselves (16) expressed in the Fourier do¬
main. As a result of the existence of the term ^ in A, the
PDE can be re-written in the Fourier domain in the form
of
A 'Sf(g~1X) = where X = (k,u>) (20)
with an eigen value at 1. The Green function G for operator
A is the distribution which satisfies A G(g~1X) = S(g~1X).
The Green function is the Dirac S(g~1X) itself. The Green
function known as the fundamental solution of the PDE as
in Equation (20). If the operator A is injective then, the in¬
verse A-1 exists and provides a convolutional-type integral
transforms whose kernel is the Green function i.e.
If g = e the identity element, we retrieve the usual Fourier
transform with kernel K(k,u) = 5(w)eiks. This procedure
defines for each kind of motion the kernel K(b,io;m;x,t)
that particularizes the usual Fourier transform for the mo¬
tion group of interest, m denotes the current motion pa¬
rameter. If the Dirac measure is transformed into a contin¬
uous wavelet with compact support, then the calculation
of 9(k,u;m) animated of motion m from its still cognate
^(x,t) becomes an integral transform with kernel K. Let
us, for example, consider the kernel of accelerated wavelets
as propagator presented in section (3) and integrate with
a still Morlet wavelet [6, 7], this yields the propagated
wavelets for second-order accelerations
'Pfo(k,u)) = (27 r) e‘T
(23)
Moreover, as the function T can now be scaled to extend
the results from the “point mechanics” towards the “object-
based mechanics” as follows
9(k,aj;m,a,ao) = / X(k,u};m;x,t)^(x,t;a,ao)dnxdt
J RxR"
(24)
We have reach so far the ability to generate, cancel or mo¬
dify analyzing wavelets as well as moving patterns.
6. CONCLUSIONS AND APPLICATIONS
mm)
G(x,£) f(x)dx
These kernels are meaningful and remind the propagators
associated with Green functions of the Schrodinger equa¬
tions. The meaning of Equation (21) and of the wavelet-
based reproducing kernels [7] leads to the following duality
of the motion analysis.
(1.) If the still version of a signal (wavelet, filter or
stochastic process) f(x) is known, then reproducing ker¬
nel integral transform provides all the moving version in
(x,t) or in (k,u). These integral transforms generate the
whole family of analyzing signals, wavelets or processes in
the observing space i2(Kn x M, dnxdt). This allows spatio-
temporal filtering, interpolating, and predicting along a tra¬
jectory.
(2.) If the animated version of a signal is known, then
Equation (21) is a filter that compensates the signal from
a given motion and gives rise to the still signal. This is
motion-compensation filtering. The advantage of such ap¬
proach is that the classical affine wavelet analysis and pro¬
cessing may then be applied on the compensated signal
(for coding purpose as in [3]). This section brings a more
general point of view on the motion analysis presented in
[3] where motion compensated filtering was performed by
building the trajectories within the signal and applying dis¬
crete wavelets along the assumed trajectories.
Let us then revisit Section (3) and compute the Fourier
transform of a Dirac measure on a trajectory
This paper has shed light on a novel motion analysis based
on a group-theoretic approach. Let us consider the pro¬
jection of moving patterns on sensor arrays which creates
the most important part of all the acceleration components
embedded in signals. The traffic sequence (Figure 2) is an
example. The projection takes place within the cone of sen¬
sor visibility (Figure 1) is a homothety (i.e. a re-scaling).
The projection may be modelled as an orthogonal projec¬
tion composed with a scaling. Let us define the 2-axis or¬
thogonal to the sensor plane and the x—y axes in the sensor
plane. The motion captured in the sensor plane is obtained
after a projection on planes n0, IIi, JJ2 parallel to sensor
at time r = 0, 1, 2 and a homothety that rescales the pro¬
jection down to the plane of the sensor (Figure 1). Let us
denote W the width of the rigid object and So the size of
the object captured by the camera. The scale ao = is
observed from plane IIo at time r = 0. At time r = n,
the size perceived from
n — w _ w _
” Sn So(l-^-r) ~
plane n„ by the camera is given by
:TT%7=ao [l + *fr+(^)V+...]
= ao[l + air 4- a2r2 + . . . + anrn + ...]. The series is con¬
vergent if l^rl < 1 i.e. with the physical observation. The
components of translation, velocity and accelerations along
x and y axis are rescaled with the ratio -r- = — = .
bo v0 d
References
[1.] A. Tekalp. “Digital Video Processing”, Prentice-Hall,
1995.
[2.] E. Dubois. “Motion-Compensated Filtering of Time-
Varying Image”, Multidim. Syst. Sig. Proc., Vol. 3, pp.
211-239, 1992.
139
w
SQUARE OF THE MODULUS OF THE WAVELET TRANSFORM
time
Figure 1: Tracking in a sensor cone.
50 100 150 200 250
Figure 2: The 20th image of the car digital image sequences.
[3.] J.-P. Leduc, J.-M. Odobez and C. Labit. “Adaptive
Motion-Compensated Wavelet Filtering for Image Sequence
Coding”, IEEE Transactions on Image processing, Vol 6,
No. 6, pp. 862-878, June 1997.
[4.] J.-P. Leduc, J. Corbett, M. Kong, V. M. Wickerhauser,
B. K. Ghosh. “Accelerated Spatio-temporal Wavelet Trans¬
forms: an Iterative Trajectory Estimation” , IEEE ICASSP,
Vol 5, 1998, pp. 2777-2780.
[5.] M. Kong, J.-P. Leduc, B. Ghosh, J. Corbett, V. Wicker¬
hauser. “Wavelet based Analysis of Rotational Motion in
Digital Image Sequences”, ICASSP-98, Seattle, May 12-15,
1998, pp. 2781-2784.
[6.] J.-P. Leduc, F. Mujica, R. Murenzi, M. J. S. Smith.
“Spatio-Temporal Wavelet Transforms for motion track¬
ing”, ICASSP-97, Munich, Vol 4, pp. 3013-3017, 1997.
[7.] J.-P. Leduc, F. Mujica, R. Murenzi, and M. Smith. “Spatio-
Temporal Wavelets: a Group-Theoretic Construction for
Motion Estimation and Tracking” , to appear in SIAM Jour¬
nal of Applied Mathematics.
[8.] J. Corbett, J.-P. Leduc, M. Kong. “Analysis of Deforma-
tional Transformations with Spatio-Temporal Continuous
Wavelet Transforms”, ICASSP-99, March 15-19, 1999.
Figure 3: Estimation of the parameters and do by com¬
puting the square modulus (energy) of the wavelet transform
as in [8]: | < T{g)<S> \s > |2 = F(o0, *f) is estimated in
the scene displayed in Figure 2. Two local maxima are de¬
tected and displayed at = 0.5s-1, ooi = 2.6) and
(HiZ = 0.38s-1, a02 = 1.8) standing for the fore and back¬
ground car respectively. If we assume d\ = 40 m for the
foreground car and a rate of 25 images per second, then we
can estimate the relative approaching velocity component at
d*! = 72 km/h (45 miles/h). For the background car, if we
assume d,2 — 50 m, then vZ2 = 68.4 km/h (42.7 miles/h).
Let us remark that the camera is traveling towards the cars;
therefore, both velocities correspond to relative values.
ANGULAR VELOCITY CHARACTERISTIC FUNCTION; absolute value
Figure 4: Spatio-temporal special function associated with
the rotational motion. The sketch is performed on sections at
constant ui, the angular velocity = 1.5 radian/image.
140
BLIND NOISE AND CHANNEL ESTIMATION
M. Frikel, W. Utschick, and J. Nossek
Technical University of Munich
Institute for Network Theory and Signal Processing
Arcisstr. 21, D-80290 Munich, Germany
mifr @nws. e-technik . tu-muenchen . de
ABSTRACT
In the classical methods for blind channel identification
(Subspace method, TXK, XBM) [1, 2, 3], the addi¬
tive noise is assumed to be spatially white or known to
within a multiplicative scalar. When the noise is non¬
white (colored or correlated) but has a known covari¬
ance matrix, we can still handle the problem through
prewhitening. However, there are no techniques presently
available to deal with completely unknown noise fields.
It is well known that when the noise covariance matrix
is unknown, the channel parameters may be grossly in¬
accurate. In this paper, we assume the noise spatially
correlated, and we apply this assumption for blind chan¬
nel identification. We estimate the noise covariance
matrix without any assumption except its structure
which is assumed to be a band-Toeplitz matrix. The
performance evaluation of the developed method and
its comparison to the modified subspace approach (MSS)
[4] are presented.
1. INTRODUCTION
One common problem in signal transmission through
any channel is the additive noise. In general, additive
noise is generated internally by components such as re¬
sistors, and solid-state devices used to implement the
communication system. This is sometimes called ther¬
mal noise or Johnson noise. Other sources of noise and
interference may arise externally to the system, such
as interference from the other users. When such noise
and interference occupy the same frequency band at the
desired signal, its effect can be minimized by proper de¬
sign of the transmitted signal and its demodulator at
the receiver. The effects of noise may be minimized by
increasing the power in the transmitted signal. How¬
ever, equipment and other practical constraints limit
the power level in the transmitted signal [5].
This work is supported by Alexander von Humboldt-
Stiftung, Bundesrepublik Deutschland.
The classical model used in communication systems
supposes on the one hand that the power of the noise
is identical on each sensor, and on the other hand that
there is no noise space/time correlation. However, this
situation is seldom met, which involve a clear degra¬
dation of the performances of the subspace methods.
Here, we recall some well-known methods which treat
the noise problem in array processing for direction-of-
arrival estimation. In fact, in recent years, there has
been a growing interest in the problem of techniques
with the objective of decreasing the signal to noise ra¬
tio resolution threshold or the spatially colored noise
[6, 7, 8, 9, 10]. The ambient noise is unknown in prac¬
tice, therefore modeling or its estimation are necessary.
The methods developed for this problem are very few
and there are no definitive solution. There are some
practical methods; in [11] two methods are obtained by
optimization of criterion and by using AR or ARMA
modeling of noise. In [7] the spatial correlation matrix
of noise is modeled by the known Bessel functions. As
in [6] the ambient noise covariance matrix is modeled by
a sum of hermitian matrices known up to multiplicative
scalar. In [8] this estimate is obtained by measuring the
array covariance matrix when no signals are present.
This procedure assumes that the noise is not changing
in function of time, which is not fulfilled in several do¬
main applications. Another possibility [8] arises when
the correlation structure is known to be invariant un¬
der a translation or rotation. The so-called differencing
covariance technique can be then applied to reduce the
noise influence. In this method, two identical trans¬
lated and/or rotated measurements of the array covari¬
ance matrix are required and assumes the invariance of
the noise covariance matrix, while the source signals
change between the two measurements. The estimate
noise covariance matrix is eliminated by a simple sub¬
traction. Furthermore, this method cannot be applied
when the source covariance matrix satisfies the same
invariance property or when only one measurement is
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
141
available. In [7] a particular modeling structure noise
covariance matrix, which takes into account the charac¬
teristic noise relative to its origins, is given. Recently,
a maximum posteriori approach (MAP) has been de¬
veloped in [10]; this method can only be applied in the
case of a linear array. In [9], the method called “In¬
strumental Variable” (IV) is used to reduce the noise
without estimated it; this estimator considers that the
noise is temporally independent. One technique based
to the MDL criterion has been developed in [12] for de¬
tection and localization of the signals in the presence
of unknown noise; this estimator is asymptotically bi¬
ased [12]. However, the study of the noise for blind
channel identification is very limited. In [4], a modi¬
fied subspace method (MSS) for blind identification in
the presence of unknown correlated noise has been pre¬
sented, indeed one use some matrices, for a time lag
when the noise is absent. The object of this correspon¬
dence is to improve the blind channel identification in
the presence of a correlated noise by whitening the re¬
ceived data. The noise is assumed spatially correlated.
The structure of the paper is as follows. In the sec¬
tion II, we present the studied problem and in section
III, we describe the noise covariance matrix model used
in this study and its estimation by the proposed algo¬
rithm, we apply the noise estimation for blind channel
identification using the subspace method. We present,
in the section IV, some simulation results and perfor¬
mance comparisons.
2. PROBLEM FORMULATION
Consider L FIR channels driven by a common source.
The output vector of the ith channel can be written as:
ri(k) = H{i)s(k) + ni(k), (1)
where, r^(fc) is the output sequence of the ith chan¬
nel, s i(k) is the input sequence and nj(fc) is the noise
sequence on the ith channel.
ti(k) - [ri(k) ri(k- (- 1) ... r^k + N - 1)],
s(k) = [s(k — M) s(k — M + 1) ... s(k + M — 1)],
ni(fc) = [rij(fc) nj(fc + l) ... nj(fc + N — 1)].
4° 4° h<$ . 0 \
0 h{0l) fcW ... ... 0
. . . . . 5
o ... o 4° 4° ••• hM*
where, is the impulse response of the ith channel,
M is the maximum order of the L channels and N is
the width of the temporal window. is of dimension
(N x (AT + Af)).
Then we have,
r (k) = Hs(k) + n (k), (2)
f r x(fc)>
(%l\
( ni(fc)\
= ; U(fc)+
:
\vL(k)j
\nLJ
\nL(k)J
The matrix H is known as the (LN x (N+M)) filtering
matrix, which has the full rank ( N + M) under the
following assumptions', the L channels do not share a
common zero and N > ( M -f 1).
The blind identification problem is to find H from the
sequence,
{r(fc) for k = 1,2,..., FT}.
The subspace method [1] exploits the sample covariance
matrix of all channel outputs: T — E [rr+],
1 *
T = — ^r(fc)r+(fc), where K is the number of sam-
fc=i
pies and + denotes the conjugate transpose. Assume
that the signals and the additive noise are independent,
stationary and ergodic zero mean complex valued ran¬
dom processes, and as K becomes large, this matrix
has the asymptotical structure: T = /HTS'H+ + T„,
with r„ = E [nn+ ] the noise covariance matrix and
Tg — E [ss+] is the signal covariance matrix.
The goal of blind channel identification and equaliza¬
tion is to identify TL (channel identification) and to es¬
timate s(fc) from r(k) (channel equalization).
The subspace blind channel identification procedure [1]
consists on the estimation of the (LN x 1) vector h
of channel coefficients from the observation vector. In¬
deed, this approach is based on the eigendecomposition
of the data covariance matrix,
r = [u.
un]H
The subspace method yields an estimate % of H by
solving the equation: U+Tf = 0, in a least square sense
(where H is subject to the same structure as Ti). This
estimate is uniquely (up to a constant scalar) equal to
H . FYom [1], we have:
u +n = h +un = o,
(3)
with Un is the (L(M + 1) x (N + M )) matrix obtained
by stacking the L filtering matrices .
Un = \lAn^T , where,
142
uP =
and h= [h(0),...,h<i x)], with hw = .
The optimization system derived in [1] is:
h = arg min h+U33h,
IWNi
where,
LN—M—N—l
U33 = £ w<°w«+
is the filtering noise projection matrix.
The noise is assumed Gaussian, complex and spatially
correlated. Its real and imaginary part are supposed
independents, Gaussian with, E[n{\ = 0, £l[n;n^] = 0,
and £’[n;nif] = Tn. T„ is the noise covariance ma¬
trix, the superscripts and “+” denote conjugate
and conjugate transpose, respectively. We consider the
noise covariance matrix is band, defined by:
T n(i,m)
i — m |> K
i — m |< K and i ^ m
i — m
Where pi = pi + jpi,i = 1, . . . ,K, pi are complex vari¬
ables, j2 = —1, a2 are the noise variance at each re¬
ceiver, and K is the spatially noise correlation length.
(°\
P12 • • •
PlK
0 \
P21
°2 P23
• • •
... 0
r„ =
0
••• Pij
... 0
^ 0
... 0
P (LN)K
■ • • aLN /
Two manners to give back observation covariance ma¬
trix a noise-free matrix: either by subtraction of the
noise covariance matrix, 'HTS'H+ = T — T„; then we
have then a “clean” observation covariance matrix; how¬
ever, we can obtain a negative matrix if Tn is bad-
estimated.
Or by whitening; in this case we find again the classical
model of communication systems ^Tn 3 ITn 3 j . How¬
ever, this processing is most robust but needs more
computational load.
From the data matrix T = 'HTgH++Tn, the goal of the
first part of this paper is to estimate the noise covari¬
ance matrix Tn and in the second part, we estimate,
blindly, H from the “clean” obtained matrix [HT g'H+]
using the subspace method [1].
3. BLIND NOISE ESTIMATION (BNE)
In many applications such as communication systems,
it is reasonable to assume the correlation is decreasing
along the receivers. That is a widely used model for
a colored noise. The correlation rate p is decreasing
when the distance between two receivers increases.
In this study, we consider the noise covariance matrix
band-Toeplitz with the diagonal values are decreasing,
so-called decreasing band-Toeplitz. It is the unique as¬
sumption to estimate the noise covariance matrix.
The BNE algorithm from the noise covariance matrix
estimation is summarized in the following steps:
Step 1: - Estimation and eigendecomposition of the re-
- 1 T
ceived covariance matrix T; T = — with T is
1 t=i ^
the number of independent realizations; T = UAU+,
where, A = diag[ Ai, . . . , A^jv], andU = [ui,U2, . . . ,U£#];
A i and u,- are the eigenvalues and the eigenvectors of
the observation covariance matrix, respectively.
- Initialization of the noise covariance matrix : Tn = 0.
Step 2: - Calculation of the matrix: W n+m = USAS' ,
with Us = [ui,u2,... ,Ujv+m] is the matrix of (N+M)
eigenvectors corresponding to the (N+M) eigenvalues,
and As = diag[\i,. . . , Ajv+m] is the matrix of (N+M)
eigenvalues.
- Calculation of the matrix: A = W v+AfW++M.
Step 3: Calculation of: = KJband T — Aj , with
r(n] is the band noise covariance matrix at first iter¬
ation, and KJband{.] designates the matrix band with
(K + 1) is the bandwidth.
Step J: Eigendecomposition of the matrix: |r — Tn =
VAV+. The new matrices A and Tn^ are, again, es¬
timated in step 2 and step 3. These iterations are re¬
peated until the improvement of Tn^.
Stop test: The algorithm is stopped when the distance
between and Tn+^ becomes less then some value
e. We define the distance between and T^,+1^ as
|| r£+1) — rW ||j?, the Frobenius norm of the matrix
j-f (i+i) _ f W)
The estimate noise covariance matrix Tn is obtained
when the algorithm is stopped.
The matrix Tn is used to “denoise” the received data.
In fact, the free-noise received covariance matrix is
f = f - fn or f = ^f^3ffn 3^- This ” clean” matrix
143
is used to estimate the channel matrix. In order to eval¬
uate its performance, we apply the subspace method
[1]. Indeed, Moulines et al. [1], showed that if the
subchannels don’t share common zeros, h is uniquely
determined by the noise subspace Un, the subspace es¬
timator is given by:
h = arg h+i/33h, where U33 is the filtering noise
projection matrix estimated from the ‘’clean” data co-
variance matrix. This estimator does not require the
knowledge of the source covariance as long as T3 > 0.
We also compare our result to the modified subspace
(MSS) method [4].
4. PERFORMANCE EVALUATION
To demonstrate the efficiency of the proposed algo¬
rithm, some computer simulations have been conducted.
In the following simulations, we take the parameters
described in [1], in fact the number of virtual channels
is L = 4; the width of the temporal window is N — 10;
the degree of the ISI is M = 4, the channel coefficients
are given by [1]:
^0
hi
h2
h3
-0.049+0.359j
0.443-0.0364j
-0.211-0.322j
0.417-0.030j
0.482+0.569j
1
-0.1 99-0.9 18j
1
-0.556+0.587j
0.921-0. 194j
1
0.873-0. 145j
1
0. 1 89-0.208j
-0.284-0.524j
0.285+0.309j
-0. 171 +0.06 Ij
-0.087-0.054j
0.1 36-0. 190j
-0.049+0.1 61 j
Table 1: Four virtual complex channels.
for all these simulations, the number of data samples
used to estimate each h ranges from 100 to 1000 in
steps of 100.
The root mean-square error ( RMSE ) defined, below,
is employed as a performance measure of the input es¬
timates:
RMSE = ^ Sfci || Hj — H ||2, where K is the
number of trials (100 in our cases) and H,- is the esti¬
mate of the inputs from the ith trial.
The signal to noise ratio (SNR) is defined as:
SNR = 101og10 We define the Frobenius
norm of estimation error (EE) of the noise covariance
matrix as :EE =|| r - (HraH+ + r„) ||F.
We compare the presented algorithm with the exist¬
ing methods such as the modified subspace approach
(MSS) [4]. This comparison is based on the root mean
square error of the channel matrix estimates. We recall,
this approach in the following: Let T(r) = 71J(t)'H+ +
r„(r), where J(r) is the (N + M) x (N + M) shift
matrix. In [4], one assumes that rn(r) = O as long
a s t > N. Therefore, we have the relation T(t) =
'HJ(t)'H+ for t > N. At the time lag r = N, T(N) =
R (J(1V) + J(AT)+) 'H+, the matrix T(N) is used to es¬
timate the channel parameters.
The Figures (la and lb) present the root square-mean
error (RMSE) of the parameters estimates for a band-
Toeplitz noise covariance matrix and the FYobenius norm
of estimation of error (EE) of the noise covariance ma¬
trix versus number of samples.
Figure 1: (a) Root square-mean error (RMSE) of the parameters
estimates (band-Toeplitz noise covariance matrix), (b) Frobenius
norm of estimation of error (EE) of the noise covariance matrix
(band-Toeplitz noise covariance matrix) versus number of samples
In the case of a band noise covariance matrix with a
correlation length K — 4, we have Figures (2a and 2b),
versus SNR between 0 dB to 16 dB.
Figure 2: (a) Root mean-square error of the parameters esti¬
mates (band-Toeplitz noise covariance matrix ( K = 4)) versus
SNR. (b) Frobenius norm of the estimation of error (EE) of noise
covariance matrix as a function of number of iterations.
We study, the influence of the correlation length versus
the error of the noise covariance matrix estimation Fig¬
ure (3a) and the channel parameters Figure (3b). In
fact, the correlation length varies between K = 1 and
K = 4, with SNR = 3 dB.
The normalized error (NE) is defined by, NE =
We consider the noise covariance matrix band, and we
estimate the normalized error and the Frobenius norm
versus of different scenarios of the channel matrix (Fig¬
ures (4a and 4b).
These, simulations show that the processing which con¬
sists to first estimation of the noise covariance ma¬
trix and prewhitening the observation has many ad¬
vantages, is more efficient then the modified subspace
(MSS) approach [4]. The use of the denoised subspace
144
io'
REFERENCES
Figure 3: (a) Root mean-square error of the parameters esti¬
mates versus correlation length, (b) FYobenius norm of the esti¬
mation of error (EE) of noise covariance matrix as a function of
correlation length.
Figure 4: (a) Normalized error (NE) of the parameters estimates
versus scenarios of channel matrix when the noise covariance ma¬
trix is band, (b) FYobenius norm of the estimation of error (EE) of
band noise covariance matrix as a function of scenarios of channel
matrix.
method presented in this paper becomes interesting in
the case of low SNR and when the noise covariance
matrix is band. When the length correlation increases,
the interest of the estimation of the noise increases
also. Several computer simulations confirm these con¬
clusions.
This algorithm can be, also, applied, naturally, for
other blind channel identification methods such as XBM,
TXK ...[2, 3] disregard of the system type used.
5. CONCLUSION
To estimate, blindly, the noise than the channel param¬
eters, an algorithm was presented. We have considered
a spatially correlated noise, with only the assumption
that the matrix noise is band-Toeplitz, than by an iter¬
ative algorithm using the eigenstructure, we have esti¬
mated the noise parameters. In order to use a ” clean”
data for the the estimation of the channel matrix, the
estimated noise matrix was used for ’’prewhitening”
the observations. The subspace approach was, than,
applied for the blind estimation of the channel param¬
eters.
[1] E. Moulines, P. Duhamel, J.F. Carodoso, and
S. Mayrargue, “Subspace methods for the blind iden¬
tification of multichannel fir filters,” IEEE Trans, on
Signal Processing, vol. 43, no. 2, pp. 516-525, Feb.
1995.
[2] L. Tong, G. Xu, and T. Kailath, “Blind identification
snd equalization based on second-order statistics: A
time domain approach,” IEEE Trans, on information
Theory, vol. 40, no. 2, pp. 340-349, Mar. 1994.
[3] J. Xavier, V. Barroso, and J. Moura, “Closed-form
blind channel identification and source separation in
sdma systems through correlative coding,” accepted
for IEEE Journal on Selected Areas on Communica¬
tion, Special Issue on Signal Proessing for Wireless
Communications , 1997.
[4] K. Abed-Meraim, Y. Hua, P. Loubaton, and
E. Moulines, “Subspace method for blind identification
of multichannel fir systems in noise field with unknown
spatial covariance,” IEEE Signal Processing Letters,
vol. 4, no. 5, pp. 135-137, May 1997.
[5] J. G. Proakis, “Digital communication.”, 3rd ed. Me
Graw-Hill, 1995.
[6] J. Bohme and D. Krauss, “On least squares methods
for direction of arrival estimation in the presence of un¬
known noise fields,” in Proceedings IEEE-ICASSP’88,
New York, NY, Apr. 1988, pp. 2833-2836.
[7] B. Friedlander and A. J. Weiss, “Direction finding us¬
ing noise covariance modeling,” IEEE Trans, on Signal
Processing, vol. SP-43, no. 7, pp. 1557-1567, Jul. 1995.
[8] A. Paulraj and T. Kailath, “Eigenstructure methods
for direction of arrival estimation in the presence of
unknown noise field,” IEEE Trans. Acoust., Speech,
Signal Processing, vol. 34, no. 1, pp. 276-280, Feb.
1986.
[9] P. Stoica, M. Viberg, and B. Ottersten, “Instrumen¬
tal Variable approach to array processing in spatially
correlated noise fields,” IEEE Trans, on Signal Pro¬
cessing, vol. 42, no. 1, pp. 121-133, 1994.
[10] K. M. Wong, J. Reilly, Q. Wu, and S. Qiao, “Esti¬
mation of the direction-of-arrival of signals in the un¬
known correlated noise, part I: The MAP approach
and its implementation,” IEEE Trans, on Signal Pro¬
cessing, vol. 40, no. 8, pp. 2007-2017, Aug. 1992.
[11] J.-P. Le Cadre, “Parametric methods for spatial sig¬
nal processing in the presence of unknown colored noise
fields,” IEEE Trans. Acoust., Speech, Signal Process¬
ing, vol. ASSP-37, no. 7, pp. 965-983, Jul. 1989.
[12] M. Wax, “Detection and localization of multiple
sources in noise with unknown covariance,” IEEE
Trans, on Signal Processing, vol. 40, no. 1, pp. 245-
249, Sep. 1991.
[13] V. Barroso, J. Moura, and J. Xavier, “Blind array
channel division multiple access (achdma) for mobile
communictions,” IEEE Trans, on Signal Processing ,
vol. 46, no. 3, pp. 516-525, Mar. 1998.
145
MULTIUSER DETECTION IN IMPULSIVE NOISE VIA SLOWEST
DESCENT SEARCH
Predrag Spasojevic
WINLAB,
Dept, of Electrical and Computer Eng.,
Rutgers University,
Piscataway, NJ 08854.
ABSTRACT
A new technique is proposed for robust multiuser detec¬
tion in the presence of non-Gaussian ambient noise. This
method is based on minimizing a certain cost function (e.g.,
the Huber penalty function) over a discrete set of candi¬
date user bit vectors. The set of candidate points are cho¬
sen based on the so-called “slowest-descent search” , starting
from the estimate closest to the unconstrained minimizer of
the cost function, and along mutually orthogonal directions
where this cost function grows the slowest. The extension of
the proposed technique to multi-user detection in unknown
multi-path fading channels is also proposed. Simulation
results show that this new technique offers substantial per¬
formance improvement over the recently proposed robust
multiuser detectors, with little attendant increase in com¬
putational complexity.
1. INTRODUCTION
Recently, a robust multiuser detection technique is de¬
veloped in [4] for demodulating multiuser signals in the
presence of both multiple-access interference and impulsive
ambient channel noise. This technique is based on the M-
estimation method for robust regression, and is essentially
the robustized version of the linear decorrelating multiuser
detector. Although this robust multiuser detector offers
significant performance gain over the linear decorrelator in
impulsive noise, there is still a large gap between its perfor¬
mance and that of the maximum likelihood (ML) multiuser
detector. However, the computational complexity of the
ML detection is quite high, and moreover, the ML detection
requires the knowledge of the exact probability distribution
of the noise, which may not be available to the receiver.
Hence, it is of interest to develop robust, low-complexity,
and near-optimal multiuser detection techniques for non-
Gaussian noise channels. Furthermore, it is of high im¬
portance to have the ability of successfully extending this
method to more general asynchronous unknown multi-path
fading channels. Described issues are subjects of this paper.
P. Spasojevic was supported in part by the WINLAB /Lucent
Technologies Wireless Post-Doctoral Fellowship. X. Wang was
supported in part by the NSF grant CAREER CCR-9875314.
Xiaodong Wang
Department of Electrical Engineering,
Texas A&M University,
College Station, TX 77843.
2. SYNCHRONOUS SYSTEM MODEL
First consider the following discrete-time synchronous
CDMA signal model. At any time instant, the received sig¬
nal is the superposition of A- user signals, plus the ambient
noise, given by
K
r = ^Qjfc6fc.sfc + n = SAb + n, (1)
fc=i
where Sk = .s^/v]7 is the normalized signa¬
ture sequence of the fc-th user; N is the processing gain;
bk € {+1, —1} and an, are respectively the data bit and the
complex amplitude of the fc-th user; S = [«i • • • sk); A =
diag(ai, • • • ,<**:); b = [hi • • ■ 6jc]t; and to = [m ■ • ■ tin]1'
is a vector of independent and identically distributed (i.i.d.)
ambient noise samples with independent real and imaginary
components. Denote
' 3R{r} '
\J> —
' S?R{A} '
A
' 3?{n} '
3{r}
5 ^ —
SZ{A}
j v —
3{n}
where v is a real noise vector consisting of 2 N i.i.d. samples.
Then (1) can be written as
y = iFfe-fu. (2)
It is assumed that each element Vj of v follows a two-term
Gaussian mixture distribution, i.e.,
Vj ~ (1 — e)M (0, p2) + eN (0, /tp2) , (3)
with 0 < e < 1 and k > 1. Here the term A/*(0, v2) rep¬
resents the nominal ambient noise, and the term X(0, kv'2)
represents an impulsive component. The probability that
impulses occur is e. Note that the overall variance of the
noise sample Vj is
y = (1 — e)p2 + €kv2 . (4)
We have Cov{r} = 2j-l2N] and Cov{n} = a2I n- The
model (3) serves as an approximation to the more funda¬
mental Middleton Class A noise model [2, 5], and has been
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
146
used extensively to model physical noise arising in radio
and acoustic channels. Recently, it has been shown that
another class of non-Gaussian distributions, the a-stable
distributions, can be well approximated by a finite mixture
of Gaussians [1], In what, follows, we consider the problem
of detecting the transmitted symbols b of all users based on
the signal model (2).
3. EXHAUSTIVE-SEARCH DETECTION AND
DECORRELATIVE DETECTION
In this section, we give a unified description of a num¬
ber of approaches to the problem of multiuser detection
in non-Gaussian noise. There are primarily two categories
of such detectors for estimating b from y in (2), all based
on minimizing the sum of a certain function p of the chip
residuals
2 N
C(b; y) = ]Tp(pj-C?b), (5)
j- 1
where is the j-th row of the matrix ’F.
• Exhaustive-search detector:
be = arg min C(b;y). (6)
het+i.-i}*
• Decorrelative detector:
/3 = arg min C(b,y), (7)
6eRK
6* = sign(/3). (8)
It is seen that the exhaustive-search detection is based on
the discrete minimization of the cost function C(b;y), over
2k candidate points; whereas the decorrelative detection
is based on the continuous minimization of the same cost
function. In general, the optimization problem (7) can be
solved iteratively according to the following steps [4]
zl = *{v-Vri. 3')> (9)
f3l+1 = (3l + / = 0, 1, — , (10)
We consider the following three choices of the penalty
function p(-) in (5), corresponding to different forms of de¬
tectors:
• Log-likelihood penalty function:
Pml(x) = — log /(a:), V’ML(ar) = -y^,(ll)
where /(■) denotes the probability density function
(pdf) of the noise sample. In this case, the exhaustive-
search detector (6) corresponds the ML detector; and
the decorrelative detector (8) corresponds to the ML
decorrelator [4],
• Least-square penalty function:
phs{x)=^x2, rphs{x) — x. (12)
In this case, the exhaustive-search detector (6) cor¬
responds to the ML detector based on a Gaussian
noise assumption; and the decorrelative detector (8)
corresponds to the linear decorrelator.
• Huber penalty function:
Pn(x) = j
r f?>
l
X\x\<s£,
, if 1*1 >^,
(13)
rpH{x) = j
f
[ c sign(a:)
if |*|<s£,
if 1*1 >s£.
(14)
2
where is the
noise variance given by (4),
and
c = JL is a constant. In this case, the exhaustive-
search detector (6) corresponds to the discrete min-
imizer of the Huber cost function; and the decorrel¬
ative detector (8) corresponds to the robust decorre¬
lator proposed in [4],
4. SLOWEST-DESCENT-SEARCH DETECTION
Clearly the optimal performance is achieved by the ex¬
haustive search detector with the log-likelihood penalty func¬
tion, i.e., the ML detector. As will be seen in Section 5, the
performance of the exhaustive search detector with the Hu¬
ber penalty function is close to that of the ML detector,
while this detector does not require the knowledge of the
exact noise pdf. However computational complexity of the
exhausive search detector (6) is on the order of 0( 2K). We
next propose a local search approach to approximating the
solution to (6). The basic idea is to minimize the cost func¬
tion C(b;y) over a subset O of the discrete parameter set
{— 1,+1}X that is close to the continuous stationary point
f3 given by (7). More precisely, we approximate the solution
to (6) by
b 3 = arg min C(b-,y). (15)
6efl
In the slowest descent method [3], the candidate set S~I con¬
sists of the discrete parameters chosen such that they are
in the neighborhood of Q (Q < K) lines in 1RK, which are
defined by the stationary point f3 and the Q eigenvectors of
the Hessian matrix V2(f3) of C{b\y) at (3 corresponding to
the Q smallest eigenvalues. The basic idea of this method
is explained next.
Slowest-Descent Search: The basic idea of the slowest-
descent search method is to choose the candidate points in
fl such that they are closest to a line (/3 + pg) in 1RK,
originating from (3 and along a direction g, where the cost
function C{b\ y) increases at the slowest rate. Given any line
in 1Rk, there are at most K points where the line intersects
the coordinate hyper-planes (e.g., /31 and (32 in Figure 1 for
K = 2). The set of intersection points corresponding to a
line defined by /3 and g can be expressed as
{/3l = (3 - mg ■. m =Pi/9i}*=1, (16)
where Pi and </,: denote the z-th elements of the respective
vectors (3 and g. Each intersection point (3r has only its
i-th component equal to zero, i.e., (3\ = 0.
Any point on the line except for an intersection point
has an unique closest candidate point in {+1, — 1}A. An
intersection point is of equal distance from its two neigh¬
boring candidate points, e.g., 01 is equi-distant. to b1 and b2
in Figure 1(a). Two neighboring intersection points share
147
For the three types of the penalty functions, the Hessian
matrix at the stationary points are given respectively by
(a) (b)
Figure 1: One-to-one mapping from {/3, /31 , ■ ■ • ,/3K} to
fi = {6*, b1, • • • , bK } for K = 2. Each intersection point
f3l is of equal distance from its two neighboring candidate
points. 6 * is chosen to be one of these two candidate points
that is on the opposite side of the j-th coordinate hyper¬
plane with respect to b* .
a unique closest candidate point, e.g., /31 and 01 share the
nearest candidate point b 2 in Figure 1(a). Note that b*
in (8) is the candidate point closest to (3. By carefully
selecting one of the two candidate points closest to each
intersection point to avoid choosing the same point twice,
one can specify K distinct candidate points in {+1,— 1}K
that are closest to the line (/3 4- fig). To that end, consider
the following set
{ft* €{-!,+!}*
■■K
{
sign (01) ,
~bt,
k yt {
k = i
It is seen that (17) assigns to each intersection point 0' the
closest candidate point b' that is on the opposite side of the
i-th coordinate hyper-plane from bd [cf.Figure 1 (a) (b)].
In general, the slowest-descent search method chooses
the candidate set ft in (15) as follows:
Q
n = {bd}u|J{fe^e{-l,+l}K:
9=1
ia,h _ / sign (0k ~figqk), if 0k -ngqk^0
k \ -bl, if 0k - figqk = 0 ’
gq is the q-th smallest eigenvector of V2 ,
Hence, {bq'q},,_ contains the K closest neighbors of f3 in
{— 1, +1}K along the direction of gq . Note that {g9}^=1
represent the Q mutually orthogonal directions where the
cost function C(b;y) grows the slowest from the minimum
point /3. (In case of the log-likelihood penalty function, this
corresponds to the situation where the likelihood function
drops the slowest from its peak, hence the name “slowest
descent”) Intuitively, the solution to (6) is most likely found
in this neighborhood.
Pml :
Vc(0) = *Tdiag \
^Pml {yj
(19)
Pls :
V2e((3) = *TV,
(20)
PH :
V2c((3) = 4>Tdiag J
}*■
(21)
where in (19) Pml(*) = V’mlO) - f"(x)/f(x) and in (21)
the indicator function 5(y < a) = 1 if y < a and 0 otherwise;
hence in this case those rows of with large residual signals
as a possible result of impulsive noise are nullified, whereas
other rows of are not affected.
Finally we summarize the slowest-descent search algo¬
rithm for multiuser detection in non-Gaussian noise. Given
a penalty function p(-), this algorithm solves the discrete
optimization problem (15) according to the following steps:
1. Compute the continuous stationary point (3 in (7)
using the iteration (9)-(10);
2. Compute the Hessian matrix Vc(/3) given by (19) or
(20) or (21), and its Q smallest eigenvectors g1, - - ,g®',
3. Solve the discrete optimization problem defined by
(15) and (18) by an exhaustive search (over (KQ+ 1)
points).
5. EXTENSION TO AN UNKNOWN
MULTIPATH CHANNEL
In this section, we extend the slowest descent multiuser
detection techniques developed above to the asynchronous
CDMA system with multipath distortion. Following [4], [7],
and references therein, r[i], the vector consisting of a num¬
ber of stacked one-symbol length vectors that affect the
current symbol interval i can be expressed as follows:
r[i ] = Hb[i] + n[i]. (22)
Here, 6[i] and n[i] are stacked symbol vectors, and H is the
unknown channel matrix.
We can rewrite (22) as
r[i] = H0[i] + n[i] = I/s<)[*] + »[*].
Here orthonormal column vectors of Us span the column
space of H and can be obtained using an eigen-decomposition
of the received signal autocorrelation matrix (see [6]). The
estimation of the channel matrix H is based on the users’
signature sequences and the noise subspace estimated from
the auto-correlation eigen-decomposition (see [7]).
We next obtain the robust estimate of £[?'] based on the
complex version of the decorrelative iterations (9)-(10) for
an (e.g., Huber) objective function
zl = i>(r-Us C!), (23)
C,+1 = C l+U?zl, 1 = 0,1,2,- (24)
148
where H is the Hermitian operator. 0[i] can be estimated
as follows
0[i] = ( HHH)-1HflUsC[i\ •
Note that
HQ[i\ = H0Ab[i] + H#0#[i\, (25)
where the term HoAb[i ] contains the signal carrying the
current bits &[«]; and the term contains the signal
carrying the previous and future bits {b[l]}ijn, i.e., inter¬
symbol interference. A holds unknown phases which are
estimated separately from the channel as demonstrated be¬
low. We subtract the estimated intersymbol interference
from r[i\ to obtain
f[i] = t[1] — H#0#[i] (26)
= HoAb[i\ + n[i\, . (27)
We can now set
H0$l{A} '
H0%{A} J ’
and use the methods described in previous sections to derive
decorrelative and the slowest-descent estimates of 6[i] based
on r[i}.
the slowest descent detector with 2 search directions, and
the exhaustive detector. Searching further slowest descent
directions does not improve the performance in this case.
We observe that for all three criteria the performance of the
slowest descent detector is close to the performance of its
respective exhaustive maximization version. All detectors
are significantly better then the LS based detectors.
For the multi-path channel case the following is as¬
sumed: processing gain N = 15, number of users K = 6
each user’s channel has 3 paths and a delay spread of up to
one symbol interval. The complex gains, the delays of each
user’s channel, and user signature sequences are generated
randomly. The chip pulse is a raised cosine pulse with roll¬
off factor 0.5. The path gains are normalized so that each
user’s signal arrives at the receiver with unit energy. The
over-sampling factor is 2 and the number of stacked vectors
in (22) (the smoothing factor) is 2.
Figure 3 demonstrates the performance of the Huber-
based slowest-descent method with one and two search di¬
rections, the decorrelative Huber detector, and the blind
decorrelator from [6]. Most of the performance gain of¬
fered by the slowest-descent method is obtained by search¬
ing along only one direction. Over 1 dB of gain is obtained
relative to the the decorrelative estimate. The blind ap¬
proach [6] performs poorly for this system.
Estimation of A
We next consider the estimation of the complex amplitudes
A. Following (25), we have [recall that A = diag(ai, • • • , Q^).]
Ok = Qfchk+fifc, k — l, ■■■,&. (28)
Since bk € {— 1,4-1}, it follows from (28) that 6k form two
clusters centered at respectively ak and — a*,. Let ak =
Pke3‘t’k, a simple estimator of a*, is given by &k = Pke3<l’k
with
Pk = E{\6k\ },
4>k =
j E{L [feign (!R{0fc})]}, ff£{|»{0fc}|}>E{|3{fMI}
\ £{Z[0fcsign(9{0fc})]}, fffnm0fc}|}<£{|3{<M|} "
where the operator E(-) denotes sample average. Note that
the above estimate of the phase <j>k has an ambiguity of
7r, which necessitates differential encoding and decoding of
data.
6. SIMULATION RESULTS
Figure 2: Symbol error performance of a synchronous DS-
CDMA system with N = 15, K = 8, e = 0.01, k = 100.
7. CONCLUSION
For simulations, we assume a synchronous CDMA sys¬
tem with a processing gain N = 15, number of users K — 6,
no phase offset and equal amplitudes of user signals, i.e.,
ak = 1, k = 1, • • ■ , A. User 1 signature Si sequence is
generated randomly and kept fixed throughout simulations.
Signature sequences of Users 2 through K are generated by
a circularly shifting the sequence of User 1.
For each of the three penalty functions Figure 2 presents
the symbol error performance of the decorrelative detector,
We have developed a new robust multiuser detection
technique based on the method of slowest-descent search.
By searching only over one or two directions, this method
offers significant performance improvement over the recently
proposed robust decorrelating detector in impulsive noise.
The proposed approach has been extended to multi-path
fading channels were complex channels and signal phases of
all users have to be estimated blindly.
149
Figure 3: Symbol error performance of an asynchronous DS-
CDMA system with N = 15, K = 8, e = 0.01, k = 100, in
an unknown multi-path channel with 3 randomly generated
path coefficients per user.
REFERENCES
[1] E.E. Kuruoglu and C. Molina and W.J. Fitzgerald. Ap¬
proximation of Q-stable probability densities using finite
mixtures of Gaussians. In Proc. EUSIPCO’98, Rohdes,
Greece, September 1998.
[2] D. Middleton. Non-gaussian noise models in signal pro¬
cessing for telecommunications: New methods and re¬
sults for class A and class B noise models. IEEE Trans.
Inform. Theory, 45(4):1122-1129, May 1999.
[3] P. Spasojevic. Sequence and channel estimation for
channels with memory. Department of Electrical En¬
gineering, Texas A&M University 1999.
[4] X. Wang and H.V. Poor. Robust multiuser detection
in non-Gaussian channels. IEEE Trans. Sig. Proc.,
47(2):289-305, Feb. 1999.
[5] S.M. Zabin and H.V. Poor. Efficient estimation of the
class A parameters via the EM algorithm. IEEE Trans.
Inform. Theory, 37(l):60-72, Jan. 1991.
[6] X. Wang and H.V. Poor. Blind multiuser detection:
A subspace approach. IEEE Trans. Inform. Theory,
44(2):677-691, Mar. 1998.
[7] P. Spasojevic, X. Wang, and A. Hpst-Madsen, “Nonlin¬
ear group-blind multiuser detection,” Technical Report,
WINLAB, Rutgers Univ., July 2000.
150
MAXIMUM LIKELIHOOD DELAY-DOPPLER IMAGING
OF FADING MOBILE COMMUNICATION CHANNELS
Linda M. Davis t Iain B. Callings^ Robin J. Evans *
t Global Wireless Systems Research
Bell Laboratories
Lucent Technologies, AUSTRALIA
lindadavis@lucent.com
ABSTRACT
This paper presents a new recursive algorithm
for maximum likelihood estimation of the delay-
Doppler characteristics of fast-fading mobile com¬
munication channels. The channel is modelled
as an FIR filter with rapidly varying complex
coefficients. The parameters of interest are the
mean channel taps and the tap covariance. The
structure of the channel tap covariance matrix is
exploited to provide convergence to constrained
channel estimates.
1. INTRODUCTION
Maximum likelihood constrained covariance estimation
for directly observable processes in additive noise has
received considerable attention [1, 2, 3, 4] since many
algorithms in spectral analysis rely on knowledge of
the covariance matrix. Applications include harmonic
retrieval, beamforming and direction of arrival estima¬
tion. In many such cases, the system of interest is
shift-invariant and the true covariance matrix is known
to be Hermitian Toeplitz as well as positive semidefi-
nite. This structure may be used in obtaining realistic
covariance matrix estimates, and in addition may be
exploited in to provide fast convergence to constrained
estimates and aid subsequent processing (e.g. inverses,
eigendecomposition etc.).
In this paper, we consider the extension of constrained
covariance estimation to the case where the process of
interest is observed through convolution with a known
signal in addition to the additive noise. This problem
arises in delay-Doppler radar imaging [5] and delay-
Doppler imaging of fast-fading mobile communication
channels [6]. In these situations, the underlying re¬
flectance process has a time-varying impulse response,
and therefore is two-dimensional (in time, k, and
* School of Electrical & Information Engineering
University of Sydney, AUSTRALIA
*Dept. Electrical k Electronic Engineering
University of Melbourne, AUSTRALIA
delay, e). The delay-Doppler image of a reflectance
process is also known as the scattering function, and is
related to the covariance matrix by a Fourier transform
(in the time axis indexed by k) [7].
This paper presents a new algorithm for maximum like¬
lihood estimation of the covariance matrix (and there¬
fore the delay-Doppler characteristics) of fast-fading
mobile communication channels. Importantly, our al¬
gorithm explicitly makes use of the structural constraints,
Key features of the algorithm include joint estimation
of the channel mean and covariance, and applicabil¬
ity to a general class of wide-sense stationary (WSS)
channels.
2. CHANNEL MODEL
Channel Response
Consider a discrete equivalent baseband model in which
the complex- valued time- varying channel, or reflectance
process, fk,(, represents the effect at time k, for reflec¬
tions with a path delay e. Ignoring the average delay
in the analysis, the observed signal is
L — l
zk = ^2 (*)
e=0
where L is the length of the finite impulse response
(FIR) channel, or the extent of the radar target, Xk is
the known transmitted signal, and Wk is the additive
noise introduced at the receiver.
Writing the observations for k = 0, . . . , N - 1 in vector
notation,
z = XF + w (2)
0-7 803 -5988-7/00/$ 1 0.00 © 2000 IEEE
151
where the matrix of channel inputs is
trix, R £ Tjv.l, may be written as
x0
X =
0
0
xn- i
X-L+l • •• 0
0 • • • a; A r-L
and F = [/0)o, •• -/at-1,0, ,/w-i,t-i]T 1 ■ The
time-varying channel (or reflectance) process, F is seen
to be two-dimensional in that its elements are charac¬
terized both by the time index k, and the delay index e.
When a line-of-sight path or specular (stable) reflec¬
tions exist between the transmitter and receiver, the
channel is no longer zero-mean. Thus F = F+F, where
F is the zero-mean time- varying component and F is
the mean component, constant over the observation in¬
terval, TV. Here F is a NL x 1, but contains only L in¬
dependent parameters. For convenience, we also define
the L x 1 vector G = (I<8>eT)F, and the corresponding
TV x L matrix of channel inputs Y = X(I ® 1), where
eT = [1, 0, . . . , 0] is an 1 x TV unit vector, 1 = [1, . . . , 1]T
is an AT x 1 vector of ones, (g> is the Kronecker product
operator, and I is the L x L identity matrix.
M
R = ^ rmQm (4)
m= 1
where rm are the values of the real and imaginary com¬
ponents of elements of R. There are M = 2 NL2 — l?
independent parameters, rm. The channel covariance
matrix is (by definition) positive semidefinite. This
manifests itself as a highly nonlinear constraint on the
parameters, rm.
Assuming additive white Gaussian noise (AWGN) at
the receiver, the channel covariance, R is related to the
TV x TV observation covariance matrix, Rz = E [ zzH ]
by
Rz = XRXh + a2wI (5)
where a2w is the variance of the observation noise, and
the observation is z = z + z, where z = XF = YG is
the mean response.
3. MAXIMUM LIKELIHOOD CHANNEL
ESTIMATION
Channel Covariance
The dimensionality of the channel impulse response is
reflected in the structure of the covariance matrix
R
E[FFH }
Ro,o ••• Ro,L-1
Rl-i,o • • • Rl-i,l-i
(3)
which consists of L x L blocks of A x TV matrices, RClj£2
which represent the covariance between taps (or reflec¬
tors) at delays ei and e2. For the radar target, where
scatterers are assumed to behave independently (i.e.
uncorrelated scatterers (US)) [5], the off-diagonal ma¬
trices will all be zero. However, for the communication
channel model, the inclusion of the transmitter and re¬
ceiver pulse shapes in the equivalent channel response,
fk,€, means that this is not the case.
To adequately identify the channel, we require esti¬
mates for the vector of channel tap means, G, and the
matrix of channel tap covariances, R. It is important
that the estimates maximize the likelihood over the set
of admissible structured matrices R € Tjv.i,.
It is easily shown that maximizing the likelihood func¬
tion for the channel model of Section 2 is the same as
maximizing the following expression
$(G, R) = — lndet Rz - tr {RZ_1S} (6)
where the sample covariance matrix, S = (z — YG)(z —
YG)H is a function of the mean channel G, and Rz is
a function of the channel covariance R as given above
in (5). Here, tr{-} denotes the trace operator.
Note that the likelihood, and hence 4>(G,R), is only
defined when Rz is strictly positive definite.
Lemma 1 When G is given by
When the statistics of the fading or reflectance pro¬
cess are wide-sense stationary (WSS) (in the dimen¬
sion indexed by k), the covariance matrices Rfl)(2 are
Toeplitz. The overall matrix is then Hermitian sym¬
metric and block- Toeplitz. The set of Hermitian block-
Toeplitz matrices is denoted here by TV,l-
The Hermitian block- Toeplitz channel covariance ma-
lrI hc transpose operator is denoted (-)T, and (■)H denotes a
Hermitian transpose.
G = (YHRz“1Y)-1YHRz~1z (7)
the first differential of the likelihood objective (6) is
d$ = tr {X"RZ-J (S - Rz)Rz~1XdR} (8)
Proof The first differential of the objective function
(6) is [8]
d$ = — d(lndetRz) — tr {d(Rz-1)S}
152
Now, the first term is given by
d(lndetRz) = tr{Rz-1dRz}
and the differential of an inverse is given by [8, pg 151]
d(Rz_1) = -Rz_1dRzRz_1
In order to prove that the recursion (10) increases the
likelihood (i.e. when dR, = »;(R,' — Rj_i)), we now
proceed to show that the second term in (14) is positive,
and the first term is zero.
The second term in (14) may be written
Thus
d$ = — tr {Rz_1dRz} + tr {Rz-1dRz RZ-1S}
+2 tr { Rz -1 YdG(z - YG) H }
= tr {RZ_1(S — Rz)Rz_1dRz}
+2tr{(z-YG)HRz_1YdG} (9)
Substituting (7) into (9) gives (8). ■
Unfortunately, due to non-linearities in (8) and the
need for a positive definite solution, it is infeasible to
obtain an analytic maximum likelihood solution for the
covariance matrix using (8) by setting d$ = 0. We now
present our main result in the following theorem which
leads us to our recursive algorithm in Section 4 for find¬
ing an admissible maximum likelihood solution.
Theorem 1 The sequence of covariance matrices,
{R*}, and channel tap means, {G*}, generated by the
following iterative equations (10) - (12), monotonically
increases in likelihood
R; = Rj-i + o;i(Ri — Rj-i) (10)
Rz,i = XR + (11)
Gj = (Yif(RZii)_1Y)_1YK(RZii)-1z (12)
where oti > 0 is an arbitrarily small stepsize, and where
M
R* = ^ rn,iQn, f°r rn,i satisfying the following set of
n= 1
equations, for m = 1
M
£ tr {XHRz-j_1XQnX"Rz-j_1XQm } f„(i
n— 1
= tr {XHRZ ]_1(Si_1 - ^i)R-j_lXQm} (13)
where Rz,j_i = XRj-iX^ + cr^I and S,_i = (z —
YGj_i)(z — YGj_i)H. The initial Rq must be posi¬
tive definite Hermitian block- Toeplitz (e.g. Ro = I).
Proof From (8), consider the differential of the likeli¬
hood objective function at iteration i
d$i = tr {XHRZ ]_1(S,_i — Rz,,_1)R“]_1XdR,}
— tr {X^RZ (Sj_i — Rz,j-i
-(1 /«<) XdRj XH) R-^iXdRi}
+(lM)tr {XHRZ XdRjX^ R“ XdRi } (14)
( 1 /a«)tr { XH R~ i XdRjXw R~ XdR* }
= (1/Qi)tr {XHAAHXdRiXHAAHXdRi} ;
Rz ]_j = AAh since p.d.
= (l/aj)tr {A^XdRjX^AA^XdRjX^A}
= (l/aOtrjBB"};
B = AHXdRiXHA and dR* = dRf
> 0
Now, before considering the first term in (14), consider
(13), which can be written
tr {XHRZ j_1(Si_1 - a2wl)R;}_rXClm}
-tr |xffRz J_1XRjXffR“]_1XQm| = 0
tr {XhRz ]_1(S,_i - a2wl - XR,-i XH
-(1 /at) XdRiXH)R2-j_1XQm} = 0;
since R* = Rj_i + (l/Qj)dRj
tr {XJ?Rz j_1(Si_1 — Rz,i-i
-(1 /at) XdRj XH ) R~ j XQm } = 0
M
m— 1
-(1 /«<) XdRiXH)Rzj_1XQm} drm = 0
tr|xffRz]_1(Sj_i — RZij_i
M
-(l/ai)XdRiX")R-]_1X J2 Q mdrm
M
Since dR, = ^ Qmdrm, the first term in (14) is zero.
m~ 1
■
Remark 1 Theorem 1 utilizes the inverse iteration
argument of [1 ]. However, this new result is applica¬
ble when the process of interest is not necessarily zero-
mean and is observed via convolution in additive noise.
We have included estimation of both the real and imag¬
inary parts of each of these channel parameters which
arise from the baseband model. Since the result is not
restricted to zero-mean and uncorrelated scatterer mod¬
els, it leads to a more generally applicable algorithm
than the circulant extension algorithm of [5].
153
4. RECURSIVE ALGORITHM
Theorem 1 provides us with a recursive algorithm for
maximum likelihood channel mean and structured co-
variance estimates. However, in order to perform an
iteration (10)— (12) , we must first solve (13). At each
iteration i, this can be done by forming a vector x =
[riti, . . . ,rM,i]T and an M x 1 vector b with elements
given by the RHS of (13) for m = 1, . . . , M. Now the
set of equations in (13) for m — 1, . . . , M can be writ¬
ten in the form Ax = b, where we can now solve for
x. It is easily be shown that at each iteration A is
positive definite, and therefore efficient algorithms can
be employed in the solution.
This new recursive estimation algorithm is in fact a lin¬
earized gradient algorithm, as can be see by the linear
equation (13). The formulation can easily be extended
for multiple observations.
Remark 2 For the directly observable case presented
in [1], it was sufficient to confine the estimates of the
structured covariance matrix at each iteration to the
positive definite region (by appropriate choice of the
stepsize) to obtain an admissible maximum likelihood
solution. Note that for the case presented here however,
that Rz (5) may be positive definite even when the es¬
timate of H is not. Since the maximum of the objective
( 6) may occur in this region, gradient algorithms may
not be guaranteed to find an admissible (R 6 T/v,/J
maximum of the objective function.
Example A
Our new recursive algorithm was first tested on a zero-
mean US channel. The channel was simulated with
L = 2 independent equal power fading taps with a
Jakes’ Doppler spectrum. The signal to noise ratio
(SNR) was nominally chosen to be 10 dB. The dimen¬
sion of the covariance matrix was NL — 50 and 75
samples of the channel output were used in the esti¬
mation (representing a multiple observation factor of
3). The stepsize at each iteration, a,, was chosen to
confine the corresponding estimate R, to the positive
definite region.
Figure 1 shows the progression of the objective max¬
imization with respect to computational effort. Also
shown is the progression of the algorithm of [5], using
a factor of 2 for the circulant extension. The scaling
of the curves relative to the computational effort was
based on counts of floating point operations in Matlab
for unoptimized code in both cases, and therefore the
figure is only indicative of a performance comparison.
Importantly, Figure 1 shows that the restriction of the
Figure 1: Maximization of the likelihood objective rel¬
ative to computational effort, estimates R, restricted
to the positive definite region
estimates R* to the positive definite region at each iter¬
ation may result in trapping the algorithm at the pos¬
itive definite boundary when a solution with greater
likelihood exists in the admissible region.
Remark 3 The algorithm in [5] for US channels ex¬
ploits the circulant extension property of Toeplitz ma¬
trices [9], and has been shown to be an instance of the
expectation-maximization (EM) algorithm. With sen¬
sible initialization, this algorithm maintains a positive
definite estimate of R. However, due to the augmen¬
tation of the Toeplitz matrix to a circulant matrix, the
estimation problem is modified, and conditions for con¬
vergence to an admissible maximum of (6) have not yet
been established.
Modification of the gradient
To pursue the admissible maximum likelihood solu¬
tion, modification of the gradient is required to allow
movement tangential to the positive definite boundary
whilst maintaining the positive definite constraint on
the estimates R, . Due to the complexity of the rela¬
tionship between the positive definite constraint and
the parameters rm, no obvious modification strategy is
apparent.
A simple modification we have found is to replace the
set of linear equations for calculating Rj (13) with
M
^2 tr {Rr^QnRr1! Qm} fnJ
11=1
= tr {X^R” Sj-iR" XQm } (15)
It is an unproven conjecture of this paper that this
modified algorithm converges to an admissible maxi¬
mum likelihood solution for the structured covariance
matrix.
154
Figure 2: Maximization of the likelihood objective rel¬
ative to computational effort, modified gradient
Figure 3: Delay-Doppler profile of the simulated chan¬
nel
Example B
The experiment of Example A was repeated using the
modified gradient described above. Note the smooth
trajectory of the modified algorithm, suggesting that
the algorithm is no longer trapped prematurely. Also
shown is the likelihood (V) obtained for the structured
covariance matrix estimate without the positive defi¬
nite constraint.
Figure 3 shows the delay-Doppler profile of the simu¬
lated channel. Figures 4 and 5 show the corresponding
estimates of the delay Doppler spectrum. Improvement
can be obtained using more data (with correspondingly
more computational effort) and/or higher SNR. Fur¬
ther trials show that the modified gradient algorithm
is robust in estimating the channel mean, with good
mean estimates and negligible impact on the covari¬
ance estimate and delay-Doppler profile.
REFERENCES
[1] J. P. Burg, D. G. Luenberger, and D. L. Wenger, “Esti¬
mation of structured covariance matrices,” in Proceed¬
ings of the IEEE, vol. 70, pp. 963-974, Sept. 1982.
[2] M. I. Miller and D. L. Snyder, “The role of likelihood
and entropy in incomplete-data problems: Applications
to estimating point-process intensities and Toeplitz con-
Figure 4: Estimated delay-Doppler profile, circulant
extension algorithm
Figure 5: Estimated delay-Doppler profile, modified
gradient algorithm
strained covariances,” Proceedings of the IEEE , vol. 75,
pp. 892-907, July 1988.
[3] A. Dembo, C. L. Mallows, and L. A. Shepp, “Embed¬
ding nonnegative definite Toeplitz matrices in nonneg¬
ative definte circulant matrices, with application to co-
variance estimation,” IEEE Trans, on Information The¬
ory, vol. 35, pp. 1206-1212, Nov. 1989.
[4] L. M. Davis, R. J. Evans, and E. Polak, “Maximum like¬
lihood estimation of positive definite Hermitian Toeplitz
matrices using Outer Approximations,” in Proc. of
IEEE Workshop on Statistical Signal and Array Pro¬
cessing (SSAP’98), (Portland, OR, USA), pp. 49-52,
Sept. 1998.
[5] D. L. Snyder, J. A. O’Sullivan, and M. I. Miller, “The
use of maximum likelihood estimation for forming im¬
ages of diffuse radar targets from delay-Doppler data,”
IEEE Trans, on Information Theory, vol. 35, pp. 536-
548, Nov. 1989.
[6] L. M. Davis, I. B. Collings, and R. J. Evans, “Esti¬
mation of LEO satellite channels,” in Int. Conf. on
Information, Communications and Signal Processing
(ICICS’97), vol. 1, (Singapore), pp. 15-19, Sept. 1997.
[7] H. L. Van Trees, Detection Estimation and Modulation
Theory, vol. III. Wiley, 1971.
[8] J. R. Magnus and H. Neudecker, Matrix Differential
Calculus with Applications in Statistics and Economet¬
rics. Wiley, 1988.
[9] R. M. Gray, “Toeplitz and circulant matrices: Ii,” tech,
rep., Center for Systems Research, Stanford University,
Apr. 1977.
155
ENHANCED SPACE-TIME CAPTURE PROCESSING FOR RANDOM ACCESS CHANNELS
Alexandr M. Kuzminskiy, Kostas Samaras, Carlo Luschi and Paul Strauch
Bell Laboratories, Lucent Technologies
Unit 1, Pagoda Park, Westmead Drive
Swindon, Wiltshire SN5 7YT, UK
ak9@lucent.com
ABSTRACT
The problem of maximizing the throughput in a Random
Access Channel (RACH) in a TDMA-based system is ad¬
dressed. A general analysis of a Slotted ALOHA system is
presented which shows that a possibility to recover more
than one user in a RACH collision can significantly im¬
prove system performance. Three capture algorithms based
on semi-blind space-time filtering are proposed. Their effi¬
ciency compared to the conventional (power) training-based
capture algorithm, is demonstrated by means of simulations
in a GSM(EDGE) system. The best results are obtained for
a multistage version of the training-like algorithm based on
the Least Squares (LS) estimation of space-time filter coef¬
ficients.
1. INTRODUCTION
Cellular mobile communication systems such as the GSM
make use of RACH in order to enable the initial access of
the mobile stations to the network. Packet radio networks
(like GPRS and EGPRS) also make use of similar channels
called Packet Random Access Channels (PRACH) not only
for the initial access but also during the call since channels
are allocated to users on a demand basis, rather than per¬
manently (as in circuit switched GSM). The random access
mechanism used in these systems is based on the Slotted
ALOHA principle [1]. The throughput in a slotted ALOHA
random access channel in a TDMA-system system can be
improved by using capture effects. Most capture models
in TDMA-based systems rely on power capture [2] and
not more than one of colliding packets can be recovered.
Specifically, when more than one packet arrive at the re¬
ceiver simultaneously only one of them can be captured at
the receiver given that its power exceeds a specified thresh¬
old. Capture of more than one packet in a collision of many,
leads to performance enhancement. We start from the gen¬
eral analysis of a Slotted ALOHA system with capture. We
show that the throughput can be increased significantly if a
nonzero probability of capture of more than one packet in
a collision is assumed. Then we propose three capture al¬
gorithms based on semi-blind space-time filtering. The first
one is based on a multistage procedure where each stage ex¬
ploits the conventional LS estimator with ability to capture
at most one of the colliding packets. The second algorithm
is based on a training-like (TL) approach [5,6] that allows
us to introduce a nonzero probability to recover more than
one user in a collision of many using an one stage proce¬
dure. The third one is a combination of the multistage and
training-like algorithm. Simulations in a GSM(EDGE) con¬
text are presented, which demonstrate the superior perfor¬
mance of the multiple capture algorithms compared to the
LS estimator.
2. CAPTURE EFFECTS IN A SLOTTED ALOHA
SYSTEM
In order to demonstrate the performance enhancement due
to space-time capture processing we consider a simple S-
ALOHA system with a finite population of users, N. A gen¬
eralization of the model described in [2,3] is adopted where
the input load of the system is described by the probability
of packet arrival denoted as p0. Each of the users (termi¬
nals) generates single packet messages with probability po.
A discrete time system is considered and transmissions of
packets occur only at the boundaries between two time slots.
If the transmission of a packet is not successful the terminal
is backlogged and makes an attempt to retransmit the packet
in the next time slot with retransmission probability pr. The
capture ability of the channel is described by the capture
matrix C = [c{i,j)], where c(i, j) denotes the probability
that there are i successfully received packets given that there
are j packet transmission attempts in the same time slot
(0 < i, j < N). It is assumed that all transmitting terminals
are aware of the outcome of their transmissions before the
end of the time slot through an ideal feedback (downlink)
channel. The state of the system can be described by the
number n of backlogged terminals (0 < n < N).
The steady state behavior of this discrete time Markovian
system is determined by the (N + 1) x (N + 1) transition
0-7803-5988-7/00/$10.00 © 2000 IEEE
156
probabilities matrix II = [7r„,m], where 7rn,m is the proba¬
bility that the state of the system (population of backlogged
terminals) is m during time slot t + 1, given that during time
t the state was n. The adopted model allows us to express
these transition probabilities as follows:
N—n n min{n— ,
'--EE E ( r
j=0 j— 0 fc=max{n— m+i— j',0}
pi( 1 - pr)n~jc(k,i)c(n -m + i - k,j).
)
(1)
The expression for the transition probabilities in [3] is a
special case of (1) when the capture matrix of the system
becomes:
{1 - Qj, * = 0
Qj, i = l , (2)
0, i> 1
where qj is the probability that one out of j transmitted
packets is successfully received. A semi-analytical ap¬
proach has been followed for the calculation of the tran¬
sition probabilities. The elements of the capture proba¬
bility matrix C, for the purposes of this paper, have been
calculated through simulation. In particular the elements
c(i,j) with 1 < i < Mi, 1 < j < M2 (typical val¬
ues Mi = 3, M2 = 5) are calculated via simulation, and
c(0, j) = 1 - J2ii\ c(*,i) for 1 < j < Mi. Furthermore,
c(0,0) = landc(i,j) = 0 for all other (i,j).
The steady state distribution P = {Pk }k=() of the number
of backlogged users is given as the solution to the following
problem [4]:
p n = p (3)
under the constraint:
N
]TP* = 1. (4)
fc=0
As a performance metric the average number of success¬
fully transmitted packets per time slot has been chosen,
which is referred to as the average throughput S. The aver¬
age throughput can be calculated as follows:
S=£S(n)P„, (5)
(n,m)
where S(n) denotes the number of successful packet trans¬
missions when the system is in state n and can be calculated
by:
N N-n N
S(n) = ^(n -m + i)- try. (6)
m= 0 i=0 j=o
A possibility to improve the system performance by
means of capture effects is illustrated in Figure 1, where
the average system throughput as a function of the retrans¬
mission probability is plotted for no capture and the ideal
capture ( S = Npo) where TV = 10 and po = 0.2. One can
see the significant gap between these two boundary cases,
which can be filled by curves corresponding to algorithms
with multiple capture ability.
Figure 1: Slotted ALOHA throughput performance for the
boundary cases
3. MULTIPLE CAPTURE PROBLEM
FORMULATION
The model of a RACH collision is shown in Figure 2. The
main assumptions are:
1) all colliding signals and Co-Channel Interference
(CCI) have the known structure of a timeslot (GSM, for ex¬
ample) and they are received synchronously,
2) all signals are from the same finite alphabet (FA)
{ah, h = 1, ..., J} and all of them have the same training
sequence which is different compared to the CCI training
sequence,
3) channel coding is used (successful capture can be de¬
tected by means of parity check),
4) multiple antenna is used at the receiver (space-time
interference rejection filtering can be applied),
5) propagation channels for all colliding signals are sta¬
tionary over the whole time slot (coefficients of a space-time
filter can be adjusted by means of off-line algorithms).
The main difficulty to recover more than one user is that
the training data for all access packets in one cell is the
same. This means that training-based algorithms cannot be
directly applied for multiple capture reception. Blind tech¬
niques could be applicable, but short burst nature of Slot¬
ted ALOHA systems makes it unrealistic because of the fi¬
nite amount of data effects [7]. A possibility to address this
problem by means of semi-blind space-time filtering algo¬
rithms is studied in this paper.
Note: The important feature of the considered problem
is that some probability of access failure can be acceptable
157
“Training-like symbols” (any place in a payload)
Figure 2: Model of a RACH collision
for RACH systems. Thus, solutions without proven ability
to recover all colliding signals in every time slot may be
useful.
4. MULTIPLE CAPTURE ALGORITHMS
4.1. Multistage algorithm
A multistage processing based on cancellation of the recov¬
ered signals from the received signal at successive stages
has been considered for different applications, for example
in [8,9]. A possible way to implement this technique in the
considered problem is presented in Figure 3 (two stages are
shown for simplicity). The conventional LS algorithm is
used at each stage in the Space-time Filter. The possible
number of stages can be found from the applicability condi¬
tion (misadjustment) for the Noise Canceller [10]:
(Number of stages - 1) * Length of channels <
Number of information symbols in a timeslot
The advantage of this algorithm is that more than one user
may be captured if the first stage is successful. The dis¬
advantage is that no signals can be recovered if there is no
capture at the first stage. We refer to this straightforward
algorithm as the MLS (multistage LS) and consider it as a
reference point for the enhanced algorithms introduced in
the next two subsections.
4.2. One stage training-like algorithm
According to a general TL approach [5], our proposal is
to use a few information symbols in the payload as an ex¬
tension of the training sequence. These symbols may be
different for different users. Thus, the enlarged training se¬
quences may be linearly independent and the LS estimator
based on these TL sequences can be applied. In Figure 2
Figure 3: Structure of the MLS algorithm
these information symbols are indicated as the TL symbols.
The coefficients of the space-time filters and signal estima¬
tions corresponded to the TL sequences can be found for
the FA signals using the following training-like LS (TLLS)
algorithm:
- form the JNtl TL sequences
M mL = {s(ni)s{n2)...s{nNT)
sm(mi)sm(m2)...sm{mNTL)}, (7)
where s(nj), i = 1, Nt are the training symbols,
{sm(mi)sm(rn2)...sm(mArTI,)} are all JNtl possible se¬
quences of the FA signal of the length Ntl\ ni and my are
the positions of the known and TL symbols ( m , i =
are known, my, j = 1. ..Ntl must be selected);
- calculate the LS estimations of the weight vectors using
the TL sequences
Wm = (R + <5I)-1Pm, m = 1...JNtl, (8)
where
Nt Ntl
R = Y, X(n,)X*(f»i) + J2 X(my)X*(my); (9)
i— 1 j~ 1
Nt Ntl
Pm = J] s*(ni)X(nj) + *m(mj)x(mi); (10)
i= 1 j= 1
where X is the vector of input signals, 6 is the regular¬
ization coefficient [11] for the conventional LS estimator
158
which usually is chosen to be close to the variance of the
noise;
- select Mi weight vectors which minimize the distance
from the FA Qm
W, =Wm,, rtij= arg min Qm, j = l,...,Mu
(ID
N,
Qm = y] rnin(| ah - W*mX(n) |), (12)
n— 1
where Ns is the number of symbols in a time slot;
- calculate signal candidates
FA) are shown for Ntl — 4 (16 TL sequences for the bi¬
nary FA) in the case of two colliding users (M2 = 2). All
situations are presented in Figure 4: no capture, one of two
users is captured, and two of two users are captured. Our
goal is to estimate probabilities of these events for different
M2 and then to calculate the system performance accord¬
ing to the semi-analytical procedure presented in Section 2.
The capture simulation results (estimated probabilities pi,
i — 1 , 2 , 3 to recover one, two or three colliding packets) are
given in Table 1 for the conventional LS algorithm (at most
one signal can be captured), for the TLLS with Ntl = 2,
and for the MTLLS with the same Ntl ■
Sj(n) = W*X(n), (13)
- apply parity check to each signal candidate and accept
different signal candidates with the positive parity check as
the captured packets.
The drawback of this solution is that the number of the
TL sequences grows exponentially with the number of the
TL symbols. Thus, only a small number of the TL symbols
can be implemented. Certainly, in this situation we can¬
not guarantee the possibility to recover all signals in a col¬
lision in each timeslot. Nevertheless, according to the Note
in Section 3 this is not necessary in the considered prob¬
lem. We have introduced a multiple capture ability in an
one stage procedure and, in Section 5, we will demonstrate
the performance improvement for only two TL symbols in
the GSM(EDGE) environment.
4.3. Multistage training-like algorithm
Capture ability can be additionally improved by means of
multistage processing similar to that presented in Section
4. 1 when the TLLS algorithm is used instead of the LS esti¬
mator. We refer to this algorithm as the MTLLS (multistage
TLLS).
5. SIMULATION RESULTS
Two antennas receiving in a typical GSM (J = 2) ur¬
ban scenario TU50 is assumed, where SNR=35dB and
SIR=6dB. In all cases a space-time filter with five coef¬
ficients in each channel is used. For each time slot, the
transmitted bits are obtained by channel encoding of one
data block. The channel coding scheme includes a (34,28)
systematic cyclic redundancy check (CRC) code (which ac¬
cepts 28 bits at the input and provides 6 parity check bits
at the output), and a (3,1,5) convolutional code (rate 1/3,
constraint length 5).
A possibility to capture more than one user in a collision
for the TLLS algorithm is illustrated in Figure 4, where the
typical curves for the selection criteria (distance from the
Figure 4: Illustration of the selection step in the TLLS for
Ntl — 4
Table 1. Estimated probabilities to capture one/two/three
packets in a collision of one/.. ./five packets
m2
Pi
Algorithm
LS
TLLS
MLS
MTLLS
1
Pi
1
1
1
1
P2
0
0
0
0
P3
0
0
0
0
2
Pi
0.87
0.47
0.01
0
P2
0
0.51
0.86
0.96
P3
0
0
0
0
3
Pi
0.68
0.56
0.21
0.20
P2
0
0.30
0.14
0.09
P3
0
0.03
0.033
0.59
4
Pi
0.54
0.56
0.32
0.32
P2
0
0.17
0.13
0.23
P3
0
0.01
0.1
0.19
5
Pi
0.41
0.47
0.30
0.35
P2
0
0.1
0.09
0.15
P3
0
0
0.02
0.05
159
The corresponding curves for the average system through¬
put as a function of the retransmission probability are shown
in Figure 5 for the conditions indicated in Section 2. One
can see the significant performance improvement for the en¬
hanced algorithms, especially for the MTLLS, compared to
the conventional LS estimator even for only two TL sym¬
bols.
Figure 5: Slotted ALOHA throughput performance for dif¬
ferent algorithms
6. CONCLUSION
It has been shown analytically that a possibility to recover
more than one user in a RACH collision can significantly
improve system performance. A semi-analytical approach
has been proposed to evaluate the average throughput over a
Slotted ALOHA system with multiple capture. Three semi¬
blind space-time filtering algorithms with multiple capture
ability have been presented. Their efficiency compared to
the training-based algorithm with a power capture has been
demonstrated in a GSM(EDGE) environment.
[4] W. Feller, “An introduction to probability theory and its
applications”, Wiley, 1968.
[5] A.M.Kuzminskiy, D.Hatzinakos, “Semi-blind estima¬
tion of spatio-temporal filter coefficients based on a
training-like approach”, IEEE Signal Processing Let¬
ters, vol. 5, n. 9, pp. 231-233, Sept. 1998.
[6] A.M.Kuzminskiy, P.Strauch, “Space-time filtering with
suppression of asynchronous co-channel interference”,
to be published in Proc. AS-SPCC, 2000.
[7] A.M.Kuzminskiy, “Finite amount of data effects in
spatio-temporal filtering for equalization and interfer¬
ence rejection in short burst wireless communications”,
to be published in Signal Processing, vol. 80, n. 10,
2000.
[8] GJ.M Janssen, “BER and outage performance of a dual
signal receiver for narrowband BPSK modulated co¬
channel signals in a Rician fading channel”, in Proc.
PIMRC, pp. 601-606, 1994.
[9] A.M.Kuzminskiy, D.Hatzinakos, “Multistage semi¬
blind spatio-temporal processing for short burst mul¬
tiuser SDMA systems”, in Proc. 32nd Asilomar Conf.
on Signals, Systems and Computers, pp. 1887-1891,
1998.
[10] B.Widrow, J.M.McCool, M.G.Larimore,
C.R.Johnson, Jr., “Stationary and nonstationary
learning characteristics of the LMS adaptive filters”,
Proc. IEEE, vol. 64, pp. 1151-1162, Aug. 1976.
[11] Y.I. Abramovich, “Controlled method for adaptive op¬
timization of filters using the criterion of maximum
SNR”, Radio Engineering and Electronic Physics,
vol.26, n.3, pp.87-95, 1981.
7. REFERENCES
[1] L. G. Roberts, “ALOHA packet system, with and with¬
out slots and capture”, ACM Computer Communication
Review, vol. 5, no. 2, pp. 28-42, Apr. 1975.
[2] C. Namislo, “Analysis of Mobile Radio Slotted
ALOHA Networks”, IEEE Journal on Selected Areas
in Communications, vol. SAC-2, no. 4, pp. 583-588,
Jul. 1984.
[3] J.J. Metzner, “Comments on a widely used capture
model for Slotted ALOHA”, IEEE Transactions on
Communications, vol. 44, no. 4, p. 419, Apr. 1996.
160
ASYMMETRIC SIGNALING CONSTELLATIONS FOR PHASE ESTIMATION
Trasapong Thaiupathump, Charles D. Murphy and Saleem A. Kassam
Department of Electrical Engineering
University of Pennsylvania
Philadelphia, PA 19104
e-mail: kassam@ee.upenn.edu
ABSTRACT
In digital communication systems, most commonly used sig¬
naling constellations are symmetric. Without a pilot tone
or known training sequence, an arbitrary phase rotation
cannot be identified from a symmetric constellation. The
standard approach to overcome the phase ambiguity is to
use differential encoding. In this paper, we introduce the
notion of using an asymmetric constellation instead of a
symmetric constellation with differential encoding. The ab¬
solute phase of an asymmetric constellation can be deter¬
mined using blind statistics of processed channel outputs.
Through simulation and analysis, we study the trade-offs
between asymmetry and other features of a constellation,
such as, data rate, power, and symbol separation.
1. INTRODUCTION
A symmetric constellation has the property that blind pro¬
cessing is unable to identify an arbitrary rotation of sym¬
bols. Synchronization with the phase of the transmitted
carrier may be done by using pilot tones or known training
sequences. In blind system, without a pilot tone or training
sequence, the receiver must rely on statistics of channel out¬
puts to recover the phase of the received signal. All of the
commonly-used symbol constellations - PAM, PSK, QAM,
and others - are symmetric when the symbols are equiprob-
able. Blind statistics of these constellations cannot produce
an absolute phase estimate. To overcome the phase ambigu¬
ity, a mapping between the data and the symbols has to be
invariant to an unknown reference phase. A simple method
is to use differential encoding. Since each symbol is used
to determine two symbol transitions, a symbol decision er¬
ror will usually result in two transition errors. The penalty
incurred by differential encoding is well characterized as a
2-3 dB loss in SNR [2], [3].
In this paper, we introduce the notion of using an asym¬
metric constellation. The absolute phase of an asymmetric
constellation can be estimated using blind statistics of pro¬
cessed channel outputs. We discuss symmetry, asymme¬
try, and how to design asymmetric constellations and abso¬
lute phase estimators. Through simulation and analysis, we
study the performance of various absolute phase estimators
and the trade-offs between asymmetry and other features
of a constellation.
2. SYMMETRY BREAKING
M- ary PAM, QAM, and PSK are the most often encoun¬
tered symmetric constellations. A symmetric constellation
may be rendered asymmetric by changing the symbol values
and/or the symbol probabilities.
Consider an M- ary constellation with M — 2m equiprob-
able i.i.d. symbols. The data rate (in bits/symbol) or en¬
tropy of the constellation is
M- 1
H{S) = - ^2 PAog2Pi = m (1)
1=0
where pi is the probability of symbol i. If the number of
symbols and the symbol locations are to remain unchanged,
an asymmetric constellation can be obtained by adjusting
the symbol probabilities. Because the symbols in the asym¬
metric constellation are no longer equiprobable, the data
rate of the constellation is strictly lower than that of the
corresponding symmetric constellation. This is a trade-off
of data-rate for asymmetry.
Figure 1: Symmetric and Asymmetric 8-PSK(ManipuIation
of the Symbol Probabilities)
Fig. 1(a) illustrates an 8-PSK constellation with equiprob¬
able symbols \/~A ■ i = 0, •• • , 7, constant transmitted
power A, and a data rate of 3 bps. On the right is an
asymmetric 8-PSK obtained by manipulating the symbol
probabilities. The value of p o has been reduced by a small
<5,0 < <5 < 1/8. To maintain ^2]=0Pi = 1 and a zero DC
value, the probabilities of some other symbols have also
been changed. Since the symbols in the second constellation
0-7803-5988-7/00/$10.00 © 2000 IEEE
161
are no longer equiprobable, the data rate of the constella¬
tion is strictly less than 3 bps. Figure 2 shows the exact
reduction in entropy as a function of <5 for 0 < <5 < 0.12.
(2)
Figure 2: 8 vs. H(S) for the Asymmetric 8-PSK
Figure 3: Symmetric and Asymmetric 8-PSK (Symbol Re¬
location)
In Fig.3, another alternative to introducing asymmetry
is by relocating some of the original 8-PSK constellation
points without changing the equal probability assigned to 8
points. With symbol probability and power unchanged, the
symbol si is rotated counterclockwise by <5 radians. In order
to maintain the zero-mean condition, symbols S2, sg, and
S7 must also be relocated. The symbols S2, $g, and sr are
rotated by — e, -Fe, and —8 radians, respectively, where e =
sin~]{cos(7r/4)-(l — cos((S)+sin(<5))}. However, introducing
asymmetry by moving some symbols closer together will
cause more erroneous symbol decisions. With perfect phase
estimation, the union bound of the error rate is
For small 8, the error rate performance can be very close to
that of coherent symmetric 8-PSK constellation.
3. ABSOLUTE PHASE ESTIMATION
3.1. Maximum Likelihood Approach
Consider the transmission of nonequiprobable M-PSK sig¬
nals over an AWGN channel. The M-PSK symbol has the
complex form s; = , * = 0, 1, . . . , M — 1, where
s[A denotes the constant signal power. The transmitted
symbol x[n] = s, with probability p,. The corresponding
received sequence is then
r[n] = x[n]ej * + w[n] n = 0, 1, N - 1 (3)
where w[n] is a sample of zero-mean complex white Gaus¬
sian noise and <j> £ (0, 27r) is an arbitrary phase introduced
by the channel. For the assumed AWGN model, the pdf of
r[n] can be modeled as a mixture of M distributions
p(r[n]; <t>) = (/>;) ■ /j(r[n]; (/>)
>= o
where
( / n ^ 1 ( |r[n] - Sie-7'1^2^
fi(r[n]; = ^ 6XP - 2^ - ) ■
Then the pdf of the sequence r is
P(r;0) = JJ
n=0
^2 (Pi) ■ fi(r[n];<f> )
i= 0
(4)
(5)
(6)
The MLE of <f> is the value that maximizes the likelihood
function in Eq.(6). In general, the derivative of lnp(r; <j>)
with respect to 4> does not reduce to a simple form. The
MLE of <f> can be obtained numerically by using iterative
maximization procedures. The difficulty with the use of
these numerical methods is that in general the point found
may not be the global maximum but possibly only a local
maximum or even a local minimum.
A simpler alternative likelihood method of finding the
absolute phase is based on the use of the phase statistics.
We may express the unknown phase as cj> — 4>o + k ■ (-jj)
where k = integer, 0 < k < M — 1 and 0 < cf>o < 2tv/M.
Let 9[v] be the phase angle of the received sequence r[h].
The <j> o is obtained first by
where
00 = ^
(7)
/ N- 1 \
4> = angle ^ ejM6^j
(8)
is the mean phase angle of the received sequence after each
phase angle has been multiplied by M. Then, the maximum
likelihood method can be applied to find the correct value
162
of integer k. Using the estimate <j>o, the complex plane is
divided into slices qit i = 0,1, M - 1 bounded by phase
angles { ir/M + (f>o + i ■ (2n/M), i = 0, 1, M — 1}. Al¬
though the nonequiprobable symbols do cause the optimum
symbol-by-symbol decision boundaries to change at the re¬
ceiver, these angular decision boundaries are close to being
optimum for small 8. Let U be the number of points in the
received data sequence that fall in each region qi. Then, we
are able to obtain the integer k that maximizes the likeli¬
hood function defined as
k = arg maxp(n; k) (9)
k
where p(n; k) can be modeled as a multinomial distribution
with n = [no ni . . . nw-i] and n; = l(i+k) (mod M)
p( n; k )
AH
no\m\ ■ ■ • fiM-i
M- 1
■nnonni ■ ■ ■ nnM~1
,Po Pi Pm- 1
m n
Tlrl
(10)
This is equivalent to finding the integer k that maximizes
the log-likelihood function
M- 1
lnp(n; k) = In p"*'
i= 0
M- 1
yj In Pi-
i=o
(11)
Therefore, by using simple bin statistics, the absolute
phase estimate is
4> = fo + k-Q). (12)
To make the correct decision in estimating k, the es¬
timates p, should be close to their true value pi. Finding
the sample size N to generate reliable estimates of the pi
requires the joint probabilities that the p, lie within some
e intervals centered on the correct values. Using Cheby-
shev’s Inequality, we can roughly determine the number of
required samples N such that the estimate pi is within e of
its correct value pi with probability 1 — r.
P{\pi-Pi\<t}>l-^ = l-r (13)
where pi = rn/N with variance af = {p;(l — Pi)} /N. Set¬
ting e = S/P, the required N is given by
N Pi(l-P»)P2
rS 2
(14)
For the asymmetric setting shown in Fig. 1(b), the most
likely incorrect k axe the correct value of k offset by ±2
(mod 8). The two largest probability symbols pi and pr
in Fig.l appear to be the most critical values to consider.
From Eq.(14), setting pi = pi = 1/8 + a = 1/8 -I- 8/ y/2, we
obtain
(7 + 8s/2S - 32 S2)P2
64 t82
N =
(15)
In Fig.4, we plot ( Nt/P 2) as a function of 8. As an
example, setting r = 0.1 and P = 2\/2 which corresponds to
setting e to half of the difference between the largest and the
Figure 4: (Nt/P2) as a function of 8
Figure 5: Comparison of MSE of phase estimation with
different 8.
second largest values of symbol probabilities, at <5 = 0.06,
(Nt/P2) « 32 which gives N « 2,600 samples. Figure 5
shows simulation results for the MSE of phase estimation
as a function of N for a2 = 0.01 and A — 1. At the same
level of MSE performance, the sample size needed for 8 =
0.03 is approximately 4 times larger than the sample size
needed for 8 — 0.06. While the MSE performance includes
contributions from both estimation of cf>o and estimation of
k, the relative dependence of N on 8 (i.e. the factor by
which N increases for decreasing 8) is captured well by the
approximation of Fig.4.
Figure 6 illustrates the error probability performance of
the various approaches. The bottom dashed line shows the
error probability performance of the coherent symmetric
8-PSK. The top curve shows the error probability of sym¬
metric 8-DPSK. The stars show the simulated error rates
of asymmetric 8-PSK with 8 — 0.06 (H(S) = 2.9542) and
\/A = 1. The symbols are rotated by an unknown con¬
stant phase <p € (0, 2x) radians and further distorted by
AWGN. Statistics of 1,000 samples are used to estimate the
163
Figure 6: Asymmetric 8-PSK, Symmetric 8-PSK, and Sym¬
metric 8-DPSK Error Rate Comparison
absolute phase angle. As shown in Fig. 6, the performance
of asymmetric 8-PSK is close to 3 dB better than that for
symmetric 8-DPSK at large SNR. Note that the data rate
of asymmetric constellation is less than that of symmetric
8-DPSK by approximately 1.52%. Thus, in order to make a
meaningful comparison of these two modulation methods,
we should allow the 8-PSK symmetric constellation to use
some form of encoding with rate 0.985. However, to obtain
a coding gain of 3 dB, the rate will have to be significantly
lower. Thus we conclude that when we have large enough
sample size for phase estimate, the performance of asym¬
metric 8-PSK can be close to that for coherent symmetric
8-PSK constellation.
3,2. Nonparametric Methods
Without any prior knowledge on probability distribution
and exact locations of symbol values, nonparametric or
distribution-free methods can be used to estimate the ab¬
solute phase rotation.
Figure 7: An absolute phase estimation scheme for Asym¬
metric 8-PSK obtained by changing symbol locations
For an asymmetric constellation obtained by changing
the symbol locations as in Fig. 3, a simple and effective
scheme for phase estimation is based on noting that at the
correct zero angle, roughly half of the samples will fall in
two angular regions bounded by 7r/4 and 7r/2 and by — 7r/4
and — 7t/2, shown as two shaded regions in Fig. 7. The abso¬
lute phase can be estimated by searching for the angle that
gives the maximum number of points in these two angular
bins. This scheme works well in the presence of some noise,
however, at high SNR, this scheme is only able to obtain
the estimate within e of the correct phase angle. We can
further search for the angle within this range that gives the
minimum mean square error from the center angle between
these search sectors.
Figure 8: Comparison of the noise performances of asym¬
metric 8-PSK constellations obtained by changing symbol
locations, with different S (symbol relocation case).
Figure 9: Comparison of MSE of phase estimation with
different <5 (symbol relocation).
Figure 8 shows the error probability performance for
symmetric 8-DPSK and asymmetric 8-PSK obtained by
164
changing symbol locations, with different values of S. The
top and bottom dashed lines show the error rate perfor¬
mance of symmetric 8-DPSK and symmetric 8-PSK, respec¬
tively. The solid lines show the union bound of the error
performance and x marks show the simulated results. Re¬
sults are based on 1,000 equiprobable i.i.d. symbols rotated
by an unknown phase <j> and further corrupted by AWGN.
Figure 9 illustrates the MSE of the phase estimate with
different values of 5 , assuming that the equiprobable i.i.d.
symbols are rotated by an unknown phase <f> and further
distorted by AWGN with variance 0.01. Fig.8 illustrates
that with large <5, the estimate converges to the absolute
phase faster than with low S. However, with larger S, some
symbol points are relocated closer to adjacent symbol points
which will cause more erroneous symbol decisions.
The shape of the mask that is used in estimating the
absolute phase is not unique. The mask shown in Fig. 7
is just an example. We can use different masks bounded
by some different angular boundaries, such as, half-plane
shape bounded by — tt/2 and 7r/ 2 angles. The properties
of a good mask shape are straightforward. It should give
a maximum number of points at the correct angle and the
number of points in the mask should fall when it rotates
away from the correct angle. Sensitivity analysis can be
used to evaluate the performance of the mask.
4. DISCUSSION
A symmetric constellation may be rendered asymmetric by
changing the symbol values and/or the symbol probabili¬
ties. Between these two methods of introducing asymmetry
to existing symmetric 8-PSK, manipulating symbol proba¬
bilities will certainly cause some reduction of the number of
data bit transmitted per symbol and some additional com¬
plexity in encoding/decoding process to obtain the asym¬
metric probability arrangement. For the second asymmet¬
ric arrangement, the symbol probabilities are remain un¬
changed, so the data rate is the same as that of a symmet¬
ric constellation, without additional complexity in a coding
process.
vania, 1999.
[2] R.D. Gitlin, J.F. Hayes, and S.B. Weinstein, Data
Communications Principles, New York: Plenum Press
1992.
[3] J.G. Proakis, Digital Communications, New York:
McGraw-Hill 1995.
[4] G.J. Foscini and R.D. Gitlin, “Optimization of Two-
Dimensional Signal Constellations in the Presence of
Gaussian Noise,” IEEE Trans, on Communications,
Vol. COM-22, No. 1, January 1974.
[5] D.G. Forney et al., “Efficient Modulation for Band-
Limited Channels,” IEEE J., Selected Areas in Com¬
munications, Vol. SAC-2, pp. 632-647, August 1984.
5. CONCLUSION
An asymmetric constellation is introduced as an alternative
to regular symmetric constellation with differential encod¬
ing. Without the use of a pilot tone or known training
sequence, the absolute phase of received symbols can be es¬
timated blindly from asymmetric constellation using simple
statistics of the received symbols. By introducing asymme¬
try to existing symmetric constellation, the absolute phase
recovery function is obtained at the cost of very small re¬
duction in entropy and/or minimum distance. Both the
asymmetry of a constellation and the phase recovery func¬
tion may be considered as choices much as symbol separa¬
tion, the number of bits transmitted per symbol, or power,
providing new tools for constellation design.
6. REFERENCES
[1] C.D. Murphy, Blind Equalization of Linear and Non¬
linear Channels, Ph.D. Thesis, University of Pennsyl-
165
A CONVEX SEMI-BLIND COST FUNCTION FOR EQUALIZATION IN SHORT
BURST COMMUNICATIONS
Kelvin K. Au and Dimitrios Hatzinakos
Department of Electrical and Computer Engineering,
University of Toronto, Toronto, Ontario, Canada, M5S 3G4
Tel: (416) 978-1613, Fax: (416) 978-4425
{aukar,dimitris}@comm. toronto.edu
ABSTRACT
In short burst wireless communications, a training se¬
quence is incorporated in each burst for the receiver
to adjust the equalizer coefficients. However, when the
amount of training symbols is less than the spatial-
temporal equalizer tap weights, conventional
least-square technique may not provide good MSE per¬
formance. Blind methods, on the other hand, may
not achieve equalization in a short burst. A regular¬
ized semi-blind algorithm was proposed previously by
Kuzminskiy et al. to overcome this problem but lo¬
cal minima exist in the algorithm. A convex cost with
training symbols as the equalizer constraint is proposed
in this paper to avoid cost-dependent local minima.
Furthermore, comparison with the regularized semi¬
blind algorithm suggests that the proposed algorithm
achieves a lower MSE performance in the case of non¬
constant modulus signals such as 16-QAM signals.
1. INTRODUCTION
Conventional equalization techniques in wireless com¬
munications require transmission of training sequences.
This represents a system overhead and effectively re¬
duces the information rate. On the other hand, blind
equalization algorithms do not require training. One of
the most popular blind algorithms is the family of con¬
stant modulus algorithms (e.g. CMA 2-2 or Godard [2]
algorithm, CMA 1-2 or Sato algorithm). There are
several disadvantages in using the CMA family of algo¬
rithms. One of them is the existence of local minima.
In situations where fractionally-spaced equalizer or an¬
tenna array are used, the Godard algorithm was shown
to converge globally [3j. Unfortunately, this is not true
for CMA 1-2 (Sato) algorithm which was demonstrated
to have cost-dependent local minima in either case [4],
This work was supported by the Natural Sciences and Engi¬
neering Research Council of Canada (NSERC).
Another drawback of blind algorithms is the slow con¬
vergence and inability to achieve equalization in a short
burst.
A regularized semi-blind algorithm was proposed
in [1] which combined the LS and CM 1-2 costs. The
ability to successfully equalize the channels with a spa¬
tial-temporal filter was demonstrated and thus offered
the possibility of reducing the number of training sym¬
bols. However, local minima inherent to the cost exist.
Using a convex cost will eliminate the possibility of con¬
vergence to cost-dependent local minima. Blind con¬
vex cost with equalizer tap- anchoring was introduced
in [5, 6]. In this paper, we shall make use of the training
sequence in conjunction with the blind convex cost [6]
to formulate a new and more efficient semi-blind algo¬
rithm. Simulation results demonstrate the potential of
the proposed algorithm for constant and non-constant
modulus signals.
2. SPATIAL-TEMPORAL SIGNAL MODEL
We assume there are K users in the model. One of the
user is the signal of interest. Without loss of generality,
we shall denote the first user to be the desired signal.
The remaining K— 1 signals are coming from nearby co¬
channel cells. At the base station receiver, an antenna
array of M sensors is employed.
The data is processed in a burst of N symbols which
are assumed to be received under a stationary environ¬
ment. There are Nt training symbols in each burst
and the starting position of the training sequence is Ns
which is assumed to be known. The transmitted signals
undergo linear channels which are assumed to be FIR
of length Nc. This assumption is valid when we have a
finite delay spread. Equalization is necessary when the
delay spread is larger than the symbol duration. The
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
166
received signal at the j-th sensor is given by:
K
Vi(n) = J2 ciJxi(n) + vAn)
2=1
for i = 1, . . . , K, j = 1, . . . , M
Cij — [cij (0) , . . . , Cij(Nc — 1)] ,
Xj(n) = [^(n),... ,Xi(n- Nc + 1)]T,
where H denotes the conjugate transpose of a matrix
and each c^- ( n ) is a complex Gaussian random variable
whose amplitude does not change over the duration
of the burst. The noise Vj(n) is a complex circularly
symmetric additive white Gaussian noise of variance
<*■
Recall that the first user is the desired signal. The
equalizer output for the signal of interest is given by:
zi(n) = wHy (n), (4)
where y(n) = [yf(n), . . . ,y^(n)]T and each y j(n) =
[yj (n) , . . . , yj{n - Nw + 1)]T. The spatial-temporal
equalizer taps are w = [w^, . . . , wj?]r and each
w j = [wji, . . . , WjNw}T- The vector w has a dimension
of MNW x 1.
(1)
(2)
(3)
3. LS SOLUTION WITH FEW TRAINING
SYMBOLS
When a burst of known symbols (training) is received,
the method of least square can be used to obtain the
spatial-temporal equalizer coefficients. The following
equation is satisfied:
Rw = p, (5)
where R is the time-averaged spatial-temporal auto¬
correlation matrix
Ns+Nt-1
r = y(nMn)H’ (fi)
Jrz,
and p is the time-averaged spatial-temporal cross-cor¬
relation matrix
P
Nt
N,+Nt- 1
x\{n-d)y{n)
n=N ,
(7)
for some delay d.
If the number of training symbols is fewer than
the number of spatial-temporal equalizer coefficients
NWM , R has null(R) = NWM - Nt. Therefore there
are many solutions to (5) which can be expressed as:
NWM-Nt
w = R+p+ ^ WjUj, (8)
2=1
where R+ is the pseudo-inverse of R, Uj’s are a set
of orthonormal basis of the null space of R and wt’s
are a set of coefficients. Equation (8) can be expressed
compactly as:
w — R+p + Uv, (9)
where U = [Ui, U2, . . . , U NWM-Nt] an^
v = [vi,v2,... ,VNwM-Nt}T-
The semi-blind algorithm in [1] tried to regularize
the standard LS solution with the CM 1-2 cost to pro¬
vide a better estimation of the equalizer coefficients in
the case of Nt < NWM. The algorithm minimizes the
cost
N,+N,-l
J (w) =— \zi(n) - X!(n - d)\2
n-N.
+ \-Rl?
72= 1
where R\ = E\an\2/E\an\ with an being the alphabets
in a signal constellation and 0 < p < 00 is a regularized
constant. We shall refer readers to [1] for details of the
algorithm.
4. SEMI-BLIND EQUALIZATION BASED
ON A CONVEX COST FUNCTION
4.1. Background
Since cost-dependent local minima exist in the regular¬
ized semi-blind algorithm, there are two ways to avoid
convergence to such minima: 1) devising a good initial¬
ization strategy of equalizer tap weights or 2) choosing
alternative cost functions that are convex. In this pa¬
per, we are primarily interested in adopting a convex
cost function in the problem of semi-blind equalization.
In [5] (and references therein), a convex cost func¬
tion based on the norm of an equalizer output was
proposed in the context of blind equalization. The idea
comes from the fact that the opening of the eye of the
signal constellation is characterized by the intersym¬
bol interference (ISI). Suppose the combined channel-
equalizer response is c*w = h, the eye is opened when
the magnitude of h(5) for some delay 5 dominates the
rest of the coefficients, IM*)I- This is closely re¬
lated to the l\ norm of the combined channel-equalizer
response. In practice, however, we can never know the
channel response explicitly. An equivalent but more
useful formulation is using the norm of the equalizer
output [5, 6, 7]. In [6], the following cost is proposed:
J( w) = ||Re(z(n))||00 + ||Im(z(n))||oo (H)
167
with the constraint
Re(wjfc) + Im(wjfc) = 1. (12)
Two remarks about (11) and (12) are in order:
1. The cost (11) is appropriate for square- type con¬
stellations such as 4-QAM, 16-QAM etc.
2. The constraint (12) anchors one of the equalizer
taps. This is needed to avoid the all-zero equal¬
izer coefficients which is a valid but trivial mini¬
mum to this type of convex cost function.
4.2. Convex cost with training constraint
In this section, we propose a linear constraint to be used
for the convex cost (11). We call it semi-blind because
the linear constraint makes use of the small amount of
known training symbols present in the received burst
of data. The idea was essentially discussed in the pre¬
vious section. When the number of training symbols is
fewer than the spatial-temporal equalizer coefficients,
the solution of the LS problem can be expressed as (and
restated here):
Rw = p (13)
w = R+p + Uv. (14)
Equation (13) can be viewed as a constraint on the
equalizer and can be adopted to replace the tap-an¬
choring technique. Hence (11) and (13) describe our
semi-blind convex cost.
There are several properties of this semi-blind algo¬
rithm:
1. The semi-blind constraint (13) is linear. It can be
thought of as a generalization of the tap-anchoring
technique.
2. Because of the linear constraint, convexity of the
cost (11) is still preserved.
3. Convexity of the cost (11) is established in a dou¬
bly infinite equalizer (ideal) setting and also in
a finitely parameterized equalizer (practical) set¬
ting [6]. Therefore, using an FIR equalizer main¬
tains convexity unlike the Godard cost function.
4. As in the case of the blind convex cost function,
this kind of equalization technique leaves an un¬
known gain at the equalizer output [7] . Hence an
automatic gain control (AGC) is needed to scale
the output. This can be done with the knowledge
of the known signal constellation.
4.3. Implementation
Since norm cannot be implemented in practice, we
approximate the norm with lp norm for some large
P :
J(w) = ||Re(z(n))||oo + ||Im(z(n))||00
- lim ll^e(2(n))llp + ||Im(z(n))||p
P A (15)
~ (£|Re(z(n))|p)p + (£|Im(z(n))|p)*
for large p.
Convexity is preserved in this approximation [7]. In
actual implementation, we can minimize the cost
J (w) = £|Re(z(n))|p + £|Im(z(n))|p (16)
to simplify computation. Substituting (14) in (16) and
taking the gradient with respect to v* , we obtain
G = Vv. J(v) = E|pU"y(n)(|Re(z(n))|p-2Re(z(n))
- j'|Im(z(n))|p-2Im(z(n))j 1.
(17)
The received data is processed in a burst of N sym¬
bols. A recursive method based on the gradient descent
is used to obtain the spatial-temporal equalizer coeffi¬
cients. The algorithm is given by:
v(fc+i) = v(fc) _ (fe> (18)
where denotes the vector v at the k-th recursion,
H is a small step size and is an estimate of the
gradient (17) at the k-th recursion. This estimate is
obtained by averaging over the burst:
«(fe) = jr E{puHy(n)(lRe(z(fc)(n))r2
n— 1 ^
R e^z^(ra)^ - j|Im^z^(n)^ |p-2Im^z^(n)^|.
(19)
The algorithm is initialized with v® = 0. Such ini¬
tialization is equivalent to setting the equalizer with
R+P (i.e. the particular LS solution in (14)). Then
w« = R+p + UvW.
4.4. Simulation Results
In this section, we shall provide some simulation re¬
sults on the performance of the proposed semi-blind
algorithm. Three users’ signals ( K = 3) are impinging
168
on a receiver with four sensors (M = 4). The first user
is the desired signal and the other 2 users are interferers
from other co-channel cells. We shall assume that the
SNR of the desired signal at the receiver is 30 dB. The
signal-to-interference ratio (SIR) is 3 dB in our simula¬
tions. The signals go through their respective channels
which are modeled as 3 taps. This is the case when
the delay spread is around 3 — 4 symbol periods. At
the receiver, each sensor has an equalizer of length 6.
Hence the spatial-temporal equalizer has a total of 24
coefficients.
When implementing the semi-blind algorithm (16),
the choice of the exponent p has to be determined. Fig¬
ure 1 shows a plot of the MSE achieved using different
p's for 16-QAM signals. The MSE is lower when a
larger p is used. However, a compromise has to be
struck. Using too large a p might have numerical prob¬
lems in the recursion at the initial stage when the noise
and ISI is severe while using too small a p does not ap¬
proximate (16) well. The pure blind convex algorithm
in [6] uses p = 12. We shall also use this value of
p in subsequent simulations. The step size p for the
recursive algorithm is 0.001. The performance mea¬
sure is the mean square error (MSE) of the output.
We shall compare the MSE among the convex semi¬
blind, regularized semi-blind and pure LS algorithms
in the case where Nt < MNW. The blind algorithm
with tap-anchoring constraint (12) is also implemented
using a recursion similar to (18) but in terms of w. The
blind case (which does not take into account of known
symbols present in the burst) fails to converge under
this scenario for both 4-QAM and 16-QAM (Fig. (2)
and Fig. (3)). An AGC is used at the output for the
convex semi-blind algorithm so that the comparison is
meaningful. The AGC adjusts the gain by
where an is the alphabets in the constellation and
|z(n)|2 is the average over the burst. The term E\an\2
can be pre-computed since the constellation is known.
This is, in fact, the variance of the constellation and in
our simulations, we set E|ara|2 = 1.
Figure (2) shows the MSE vs. Nt for the case of
4-QAM signals. The MSE is that of the desired user.
The burst has 150 symbols. The LS curve indicates the
MSE if we are only using the training sequence to com¬
pute the equalizer coefficients. It is also an indication
on the MSE before passing through the semi-blind al¬
gorithms since we initialize the algorithms using the LS
solution. The regularized semi-blind algorithm is im¬
plemented as in [1]. Our convex semi-blind algorithm
runs for 500 recursions. The MSE plot is obtained by
averaging over 40 runs of bursts of 150 symbols. The
regularized semi-blind algorithm achieves smaller MSE
in this scenario than that of the convex semi-blind al¬
gorithm.
The next simulation is on 16-QAM signals. In this
case the MSE vs. Nt plot (Fig. (3)) is obtained by av¬
eraging 40 runs of bursts of 200 symbols. The convex
semi-blind algorithm iterates 500 times. We can see
that in this scenario, it has a smaller MSE starting
from Nt = 12 than the regularized semi-blind algo¬
rithm. The latter method does not perform as good
as in the case of 4-QAM signals. If we can tolerate an
MSE of no more than, say, 0.05, then the regularized
semi-blind method will fail in this case while the convex
semi-blind method is suitable for Nt > 16 in a burst.
5. CONCLUSIONS
In this paper, a convex cost with training constraint
is proposed for semi-blind adjustment of the coeffi¬
cients of a spatial-temporal equalizer in general. Com¬
pared to other blind and semi-blind methods in a short
burst communication scenario, the proposed method
performs better especially with non-constant modulus
signal constellations. Such type of constellation is pro¬
posed in the 3rd generation wireless standard when
higher data rates are needed.
Figure 1: Plot of MSE vs. Nt for the semi-blind convex
algorithm using different p (K = 3, 16-QAM signals,
N = 200).
169
REFERENCES
Figure 2: 4-QAM case: MSE vs. Nt for pure LS, con¬
vex blind, convex semi-blind and regularized semi-blind
algorithms (K = 3, N = 150).
Figure 3: 16-QAM case: MSE vs. Nt for the pure LS, '
convex blind, convex semi-blind and regularized semi¬
blind algorithms ( K = 3, N = 200).
[1] A. Kuzminskiy, L. Fety, P.Forster, S. Mayrar-
gue, “Regularized semi-blind estimation of spatio-
temporal filter coefficients for mobile radio com¬
munications,” in Proc. GRETSI’97, pp. 127-130,
Grenoble, 1997.
[2] D. Godard, “Self-recovering equalization and car¬
rier tracking in two-dimensional data communica¬
tion systems,” in IEEE Transactions on Commu¬
nications, vol. COM-28, pp. 1867-1875, November
1980.
[3] Z. Ding, “On convergence analysis of fractionally
spaced adaptive blind equalizers,” in IEEE Trans,
on Signal Processing, vol. 44, pp. 650-657, March
1997.
[4] Y. Li, K. Riu and Z. Ding, “Length- and cost-
dependent local minima of unconstrained blind
channel equalizers,” in IEEE Trans, on Signal
Processing, vol. 44, pp. 2726-2735, November
1996.
[5] W. A. Sethares, R. A. Kennedy and Z. Gu, “An
approach to blind equalization of non-minimum
phase systems,” in ICASSP, pp. 1529-1532, 1991.
[6] R. A. Kennedy and Z. Ding, “Blind adaptive
equalizers for quadrature amplitude modulated
communication systems based on convex cost
functions,” in Optical Engineering, vol. 31, pp.
1189-1199, June 1992.
[7] S. Vembu, S. Verdu, R. A. Kennedy and W.
Sethares, “Convex cost functions in blind equal¬
ization,” in IEEE Trans, on Signal Processing, vol.
42, pp. 1952-1960, August 1994.
170
Performance Analysis of Blind Carrier Phase
Estimators for General QAM Constellations
E. Serpedin 1 (contact author), P. Ciblaf, G. B. Giannakis i3, and P. Loubaton2
1 Dept, of Electrical Engineering, Texas AfcM University, College Station, TX 77843-3128, Tel.: (979) 458 2287
Fax: (979) 862 4630 email: serpedin@ee.tamu.edu
2 Universite de Marne-la-Vallee, Laboratoire “Systemes de Communication”, 5 Bd. Descartes, 77454
Marne-la-Vallee cedex 2, France
3 Dept, of Electrical and Computer Engr., University of Minnesota, 200 Union St. SE, Minneapolis, MN 55455
Abstract — Large quadrature amplitude modulation (QAM)
constellations are currently used in throughput efficient high
speed communication applications such as digital TV. For such
large signal constellations, carrier phase synchronization is a
crucial problem because for efficiency reasons the carrier ac¬
quisition must often be performed blindly, without the use of
training or pilot sequences. The goal of the present paper is
to provide thorough performance analysis of the blind carrier
phase estimators that have been proposed in the literature
and to assess their relative merits.
I. Introduction
Fast acquisition of the carrier phase is a crucial issue in
high-speed communication systems that employ large QAM
modulation schemes. One of the challenges associated with
large QAM constellations is the blind carrier acquisition,
which is often required in large and heavily loaded multipoint
networks for bandwidth efficiency and little effort involved in
network monitoring. It is known that for large QAM constel¬
lations, the conventional carrier tracking schemes frequently
fail to converge and result in “spinning” [8], [10]. There¬
fore, developing computationally simple blind carrier phase
estimators with guaranteed convergence and good statistical
properties is well-motivated.
Recently, a number of blind carrier phase estimators have
been proposed [1], [2], [3], [4], [6], [11, p. 266-277], [12], but
thorough performance analysis of all these algorithms has
not been performed. In order to quantify the performance of
these estimators, the large sample (asymptotic) performance
analysis of these phase estimators will be established and
compared with the stochastic (modified) Cramer-Rao bound
[11, Section 2.4], It is shown that the seemingly different
estimators [1], [2], [3], [5], [11, p. 266-277], [12], are the same,
while the estimator proposed in [4] has a larger asymptotic
variance than the power-law estimator [3], [6], [12], It is
also shown that by exploiting the additional samples acquired
through oversampling the received continuous-time waveform
does not improve the performance of the power-law estimator
in [3], [6], [12]. Finally, computer simulations are presented
to corroborate the theoretical developments and to compare
the performance of the investigated phase estimators.
II. Problem Statement
We consider the baseband QAM communication system
where the received signal Y(n) = Yr(n) + jYi(n) is given by
Y{n) = ei0X{n) + N{n) , (1)
where Yr{n) and Yj(n) denote the in-phase and quadrature
components of Y(n), X(n) stands for the independent and
identically distributed (i.i.d.) input QAM symbol stream,
N(n) is the circularly distributed Gaussian noise, assumed to
be independent of X(n), and 0 denotes the unknown carrier
phase offset. The problem of blind carrier phase estimation
consists of recovering the phase error 6 only from knowledge
of the received data Y(n). Because the input QAM con¬
stellation has quadrant (n/2) symmetry, it follows that it
is possible to recover the unknown phase 9 only modulo a
tt/2— phase ambiguity. This ambiguity can be further elimi¬
nated through the use of appropriate coding schemes. There¬
fore, without any loss of generality, we can assume that the
unknown phase 9 lies the interval (— 7r/4, 7t/4). In the next
section, we briefly outline the blind phase estimators [1], [2],
[3], [4], [5], [11, p. 266-277], [12], and establish their exact
large sample performance.
III. Blind Carrier Phase Estimators
A. Approximate Maximum Likelihood Estimator: Fourth-
Power Estimator
The maximum likelihood (ML) estimator of 9 can be theo¬
retically derived by maximizing a stochastic likelihood func¬
tion, obtained by averaging the conditional probability den¬
sity function of the received data with respect to the unknown
data stream X(n). However, for high order QAM constella¬
tions, the computational complexity involved in calculating
the likelihood function and more importantly the resulting
nonlinear optimization problem render the ML-estimator im¬
practical for most high-speed applications. The need for com¬
putationally simple estimators with guaranteed convergence
calls for alternative (possibly suboptimal, but computation¬
ally feasible) phase estimators.
Moeneclaey and de Jonghe have shown in [12] that for
any arbitrary 2-dimensional rotationally symmetric constel¬
lations (such as square or cross QAM constellations) the
fourth-power (or power-law) estimator can be obtained as
an approximate ML-estimator in the limit of small Signal-
to-Noise Ratio (SNR:= 101ogE|X(n)|2/E|AT(n)|2, where :=
stands for “is defined as”). The power-law estimator and its
sampled version are defined as:
9
0
[(EX*\n)) EYA{n)] ,
E(X*Hn))^n=1J (n)
(2)
(3)
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
171
where the superscript * stands for complex conjugation and
the operator E(-) denotes the expectation operator. The
fourth-power estimator does not require any complex nonlin¬
ear optimizations, but it requires a-priori knowledge of the
input constellation E(X*4(n)). However, this is not a restric¬
tive assumption since for most QAM constellations, EX*4(n )
is a negative real-valued number, whose effect can be easily
accounted for. Using standard convergence results [9] it can
be checked that asymptotically (3) is1 w.p. 1 a consistent
estimator (0 — * 9 as N — > oo) for any SNR range. An expla¬
nation can be obtained by observing that, in the presence of
circularly and normally distributed noise N(n), the following
relation holds:
jf E ^ EYAW = ei4eEX4(n) , (4)
n=l
where the second equality in (4) is obtained by expanding
EY4(n) = E(exp(j9)X(n) + N(n))4, taking into account the
independence between X(n) and N(n), and ENk(n) = 0, for
any positive integer k. Hence, (3) recovers the carrier phase
from the phase of the fourth-order moment of the received
data.
Cartwright has proposed estimating the unknown phase 6
using a different set of fourth-order statistics [3], Define the
following fourth-order moments and cumulants:
7 := E[Yr4} + E{Y4} - 6E[Yr2Y?\ , (5)
7a := cum(Yr, Yr, Yr, Yt) = E[Y?Yt } - 3 E[Yr2]E[YrYi\
= E\Y?Yi\ , (6)
76 := cum(Yr, Yt, Yu Yt) = £[yPlf] - 3E[Y?]E[YrYi]
= E[YrYi3] , (E[YrYi] = 0). (7)
Cartwright’s estimator is defined by:
phase estimator with guaranteed convergence has been pro¬
posed in [2] for square-QAM constellations. Herein, the car¬
rier acquisition problem is reduced to the blind source sep¬
aration problem of the linear mixture of the in-phase and
quadrature-phase components of the received signal, and a
cumulant-based source separation criterion is proposed to es¬
timate the unknown phase-offset [2], In [1], [11, pp. 271-277],
a low SNR approximation of the likelihood function, assum¬
ing PSK input constellations, is shown to have the same form
as the estimator [2]. Furthermore, it is justified that this es¬
timator can be used even for general QAM constellations [11,
pp. 271-277]. By relying on Godard’s quartic criterion [8],
Foschini has shown an alternative derivation of this phase
estimator in [5]. Next, we describe briefly the estimator pro¬
posed in [2] , which relies on the observation that the in-phase
and quadrature components of a square-QAM constellation
are independent.
Let cf> denote an estimate of the unknown phase offset 6 ,
define the “rotated” output Y(n) := exp (—j<j)) Y(n), and
assume that X(n) belongs to a square-QAM constellation.
In the absence of noise and if <j> = 0, then the in-phase
and quadrature components of Y(n) = X(n) are indepen¬
dent. Thus, the joint cumulants of the in-phase ( [Yr(n )) and
quadrature ( Yi(n )) components of Y (n) are equal to zero
7a := cum(Yr(rc), Yr(n), Yr(n), Yi(n)) = 0 ,
76 := cum(Yr(n),Yi(n),Yi(n),Yi(n)) = 0 , (10)
and2 7a — 76 = 0. It is interesting to remark that (10)
continues to hold true even in the presence of additive cir¬
cularly and normally distributed noise N(n), because the
cumulants of the in-phase and quadrature components of
N(n) cancel out. By taking into account (9), it follows that
7a —76 = (EY4 (n) — EY*4 (n)) /8j . Thus, 9 can be estimated
from:
tan(40) = 4 ^ 1,1 ^
0 = -at an
4
. (8)
To verify that Cartwright’s estimator is the fourth-power es¬
timator in (2), we equate the in-phase and quadrature com¬
ponents of:
EY4(n)=ej4eEX4(n) = cos (4 0)EX4{n)+j sin (49)EX4(n)
EY4(n)=E(Yr(n) + jYi(n))4=E\Y4(n) + Y4(n) - 6 Y?(n)
xY)2(n)] + 4 jE[Yr3(n)Yi(n) - Yr(n)lf (n)]
=7 + 4.7'(7a -76)- (9)
It follows that 7 = cos (49)EX4(n) and 4(7a — 76 ) =
sin (48)EX4(n), which implies the equivalence between es¬
timators (2) and (8). Cartwright’s (fourth-power) estimator
requires only that EX4(n) 7^ 0 and the independence be¬
tween X(n ) and additive circularly and normally distributed
noise N(n), and it can be applied to both square and cross-
QAM constellations, as opposed to the estimator proposed in
[4], which can be applied only to square-QAM constellations.
It is interesting to remark that three other phase estima¬
tors, derived using completely different arguments, are equiv¬
alent to the fourth-power estimator. An alternative robust
0a := arg min ^(EY4{n) - EY*4(n))
= arg mm^e-^EY4^) - ej4<tl EYt4{n)). (11)
If we consider the polar representation EY4(n) =
X4 exp(j40), from (11) we obtain that 0a = arg min ^ A4 (exp
{—j4{4> — 9)) — exp {j4(<J> — 9 ))), which implies that 9a = 9
modulo a 7r/4-phase ambiguity. Hence, estimator (11) is the
same as the fourth-power estimator (2). By taking advan¬
tage of the sign of 7 := (EY4(n)+EY*4(n))/ 2 (see (5), (9)),
the 7r/4-phase ambiguity inherent in (11) can be reduced to
a 7r/2-phase ambiguity (since if 0a — 0 = rr/4 modulo tt/2,
then 7 = —EX4{n) # EX4(n)).
In practice, many communication systems utilizing QAM
constellations employ also coding, which implies that the
SNR available at the synchronizer will be reduced by an
amount proportional to the coding gain. In order to eval¬
uate correctly the performance of these phase estimators at
all SNR levels, next we provide an exact expression for the
large sample variance of the power-law estimator, which is
valid for any SNR level and it is not restricted to the high
SNR regime as is the case with the approximate asymptotic
expression presented in [12], The next section will show that
^The notation w.p.
1 denotes convergence with probability one.
2
The reader can easily check that 7a = —75, [4].
172
(17)
the expression of [12] is not valid for low and medium SNRs
(< 20 dB).
Theorem 1. Assuming that the i.i.d. symbol stream X(n)
is coming from a finite dimensional QAM- constellation and
that the additive noise N(n) is circularly and normally dis¬
tributed and independent of X(n), then the estimate (3) is
asymptotically normally distributed with zero mean and the
asymptotic variance:
lim JV(0-0)2 =
N—*oo
My 44 — EX8(n)
32{EXi(n))2 ’
(12)
with3 py,4o ~ EY\n) = ej4° EX4 (n), and
pYAr.=E\X{n)\8+16E\X{n)\6E\N(n)\2+36E\X{n)\4
x£|lV(ra)|4+161J|X(n)|2.E|lV(n)|6+-E|./V(ri)|® (13)
Proof. Please see [13]. □
The asymptotic variance (12) does not depend on the un¬
known phase 6 , but only on the input symbol constellation
and the SNR. This confirms the conclusion drawn in [3] stat¬
ing that the standard deviation of (8) appears to be constant
with respect to the true value of 6. We evaluate next the
asymptotic performance of a phase estimator based on an
alternative set of statistics that was proposed in [4].
B. HOS-Based Phase Estimator of [6]
The phase estimator [4] extracts the unknown phase infor¬
mation 0 e (— 7t/4, 7r/4) using the relations:
(14)
(15)
with := E[ \X\4} - 2{E\X\2}2 and
7 := cum{yr(n), Yr(n), Yi(n), Yi(n)} = E[Y2 {n)Y2 (n)}
- E[Y2(n)\E[Y2(n)\ = 0.25 sin2 (20)7*. (16)
Let 7 a, Jb, and 7 denote sample estimates for ja, 76, and 7,
respectively, and define by 0i and 02 the sample estimates
corresponding to (14) and (15), respectively. The next theo¬
rem, whose proof is deferred due to space limitations to [13],
establishes the asymptotic performance of 0i and 02.
Theorem 2. Assuming that the i.i.d. symbol stream X{n)
is coming from a finite dimensional QAM-constellation and
that the additive noise N(n) is circularly and normally dis¬
tributed and independent of X ( n ) , then the estimates 6 1 and
02 are asymptotically normally distributed with zero mean and
asymptotic variances:
cot(20) =
7a ~7t
27
if
2L
7*
> 0.125
'«(-?•-§ Hf-S)-
tan (20) = 2(7? if
0 €
7* -47
/ 7T 7T\
V 8 ’ 8/ ’
-E <0.125^
7®
lim 1 V(0t - 0)2 = + cot2(2*)^ Z2cot(2g)^ ,
N-* 00 7*
3 The notation Hy,kl := EYk(n)Y *l (n) stands for the ( k + l)th-moTnent
of V'(n).
7T 7T\
.8*4 / ’
lim r.r(6< 0 )2 gU + 4tan2(26l)g22 + 4tan(20)gi2
N—*oo ' 2 ' 7I
7 r 7T\
8’ 8/ ’
(18)
where:
Qu ■= lim NE[(% - 76 ) - (ja - 7b)]2 =
N—+OO OZ
+
cos(80)[(EX4(n))2 - £X8(n)] + pv,4i
32
(19)
qi2:= lim AT£{(7-7)[(7a -76) - (7a -75)]}
N—*oo
- sin (8 9)[EX8(n) - 2(£X4(n))2] + 2Im{^y,62}
64
4 sin (40)EX4{n)[pY,22 - y,u]
64
_ 8(E|X(n)|2 + g[lV(n)|a)Im{Aty,Bi}
64
.• „n,. \2 cos (89)EX8(n) + 3/ry44
Q22 := lim NE{~1 - 7) = - — - - —
N — >oo 1 Zo
4Re{^y62} + 48/iy,n + 6 [cos (4 9)EX4(n) - My22]
128
32/iyn [cos (40)EX4(n) -2E\Y(n)\4\
(20)
128
16[Re{/ry,5i} — py,33]py,ii
128
(21)
My 4 4 is given by (13), and
Py,62 :=ej4e[EX6(n)X*2(n) + 12EX5(n)Xm{n)E\N(n)\2
+ 15EX4(n)E\N(n)\4}, (22)
My 51 :=ej4$[EX*(n)X'(n) + 5EX4(n)E\N(n)\2], (23)
My, 33 ~E\X{n)\6 + 9E\X(n)\4E\N(n)\2
+ 9E\X{n)\2E\N(n)\4 + £7|AT(rz)|6, (24)
My 22 := E\X{n)\4 + 4E\X(n)\2E\N(n)\2 + £|lV(n)|4, (25)
Myn E\X(n)\2 + E\N(n)\2. (26)
Opposed to the power-law estimator, the asymptotic per¬
formance of the Chen etal. estimator [4] depends on the
phase offset 0. As the simulation results will show (see Fig¬
ure 5), the asymptotic performance of this estimator deteri¬
orates significantly whenever the a-priori intervals (14), (15)
are missed, and for any SNR it exhibits a larger variance than
the power-law estimator.
IV. Performance Comparisons
In this section, computer simulations are performed to
assess the relative merits of the proposed phase estima¬
tors by comparing the theoretical (asymptotic) limits and
the experimental standard deviations of the investigated es¬
timators. Two additional estimators have been analyzed:
the fractionally-sampled (FS) power-law estimator and the
reduced-constellation power estimator. The FS-power es¬
timator recovers the unknown phase offset 0 by exploiting
173
all the samples obtained by fractionally-sampling (oversam¬
pling) the received continuous-time waveform in the estima¬
tor (3). A raised-cosine pulse shape with roll-off factor 0.3
and an oversampling factor P = 3 are assumed throughout
the simulations. The reduced-constellation power estimator
relies also on (3), but only the received samples that are
larger in magnitude than a given threshold are processed [10,
p. 1382], [6, p. 1482]. Thus, only the points closest to the
four corners of the constellation are processed. The asymp¬
totic performance of these two additional estimators can be
established using the result of Theorem 1, but due to space
limitations their expressions will not be presented.
In Figures 1-a and b, we have plotted the experimen¬
tal and theoretical standard deviations of all these estima¬
tors versus SNR, assuming a square 256-QAM constellation,
0 = 15°(= 7r/12), N = 512 samples, MC = 300 Monte-Carlo
runs, and additive normally distributed noise. The threshold
in the reduced-constellation power estimator has been set up
so that only the received samples corresponding to the 12
points of the input 256-QAM constellation with the largest
radii are processed. The solid line denotes the stochastic
Cramer- Rao bound (CRB= 1/(AT • SNR)) corresponding to
the phase estimate. Figure 1 shows that the power-law es¬
timator performs better than the Chen etal. estimator [4]
at all SNR levels, but worse than the reduced-constellation
power estimator at high SNRs (SNR> 20 dB). The FS-based
power estimator appears to have the worst performance. The
reduced performance of the FS-power estimator is due to the
increased “self-noise” generated by the residual intersymbol
interference effects. For this reason, we have not pursued
further the analysis of FS-based power-law estimators.
In Figure 2, we have plotted separately the theoretical
and experimental standard deviations of the power-law, the
reduced-constellation power-law, and the Chen etal. (15) es¬
timators, assuming MC = 300 Monte-Carlo simulation runs,
N = 512 samples, 0 = 7r/12, and a 256-QAM input con¬
stellation. The experimental values are well predicted by the
asymptotic limits for all three estimators, but the CRB seems
to be a loose bound. In Figure 3, the experimental and the¬
oretical standard deviations of the power-law and the Chen
etal. estimators are plotted versus the number of samples
(N), assuming SNR= 10 dB, MC = 300 Monte-Carlo runs,
0 = 7r/12. It turns out that both estimators achieve the
asymptotic bound even when a reduced number of samples
N = 250 -r 500 are used.
In Figure 4-a, the asymptotic performance of the Chen
etal. estimator (14) is analyzed, assuming 0 = 7r/5, MC =
300, and N = 512. Figures 4-b and 5 show that the per¬
formance of the Chen etal. estimator depends on the un¬
known phase 0 and has a larger standard deviation than the
power-law estimator for any phase offset 0 (Figure 5) and
for any SNR- level (Figure 4-b). In Figure 5, the theoretical
standard deviations (17) and (18) are plotted on the inter¬
val (— 7t/4, 7t/4) assuming perfect a-priori knowledge of the
intervals (14), (15) where 0 lies. However, in the presence of
a wrong a-priori knowledge on 0 (|0| > 7r/4) the performance
of estimator [4] deteriorates significantly.
In Figures 6 and 7, we have analyzed the performance of
the power-law and the reduced-constellation power-law esti¬
mators in the case of a cross 128-QAM constellation, assum¬
ing 0 = 7t/12, MC = 300, N — 4000 samples. For such
constellations, the Chen etal. estimator cannot be used since
the in-phase and quadrature components of the input symbol
stream are not independent. In Figures 6 and 7-a, the ex¬
perimental and asymptotic standard deviations of the power-
law and the reduced-constellation power-law estimators are
plotted for different SNR levels. Figures 7-a,b show that the
asymptotic limit predicts well the experimental results for all
SNR-levels and number of samples N > 1000. It appears also
that for cross-QAM constellations, the power-law estimator
exhibits very slow convergence rate and good estimates of the
phase-offset can be obtained only by using a large number of
samples ( N > 5,000). Finally, Figure 8 reveals that the ap¬
proximate asymptotic limit derived in [12] does not predict
well the exact asymptotic limit of the power-law estimator
for small and medium SNRs (SNR< 20dB).
References
Id A. N. D’Andrea, U. Mengali, and R. Reggiannini, “Carrier phase re¬
covery for narrow- band polyphase shift keyed signals,” Alta Freq ., vol.
LVII, pp. 575-681, Dec. 1988.
[2] A. Belouchrani and W. Ren, “Blind carrier phase tracking with guar¬
anteed global convergence,” IEEE Trans, on Signal Processing, vol. 45,
no. 7, pp. 1889-1894, July 1997.
[3] K. V. Cartwright, “Blind phase recovery in general QAM communi¬
cation systems using alternative higher order statistics,” IEEE Signal
Processing Letters , vol. 6, no. 12, pp. 327-329, Dec. 1999.
[4] L. Chen, H. Kusaka, and M. Kominami, “Blind phase recovery in QAM
communication systems using higher order statistics,” IEEE Signal Pro¬
cessing Letters, vol. 3, no. 3, pp. 147-149, May 1996.
[5] G. J. Foschini, “Equalizing without altering or detecting the data,” Bell
Syst. Tech. J., vol. 64, pp. 1885-1911, Oct. 1985.
[6] C. Georghiades, “Blind carrier phase acquisition for QAM constella¬
tions,” IEEE 7 Vans, on Communications , vol. 45, no. 11, pp. 1477-1486,
Nov. 1997.
[7] F. Gini and G. B. Giannakis, “Frequency offset and symbol timing
recovery in flat-fading channels: a cyclostationary approach,” IEEE
7 Vans. on Communications, vol. 46, no. 3, pp. 400-411, March 1998.
[8] D. Godard, “Self recovering equalization and carrier tracking in two
dimensional data communication systems,” IEEE 7 Vans, on Communi¬
cations, vol. 28, no. 11, pp. 1867-1875, Nov. 1980.
[9] T. Hasan, “Nonlinear time series regression for a class of amplitude
modulated cosinusoids,” Journal of Time Series Analysis, vol. 3, no. 2,
pp. 109-122, 1982.
[10] N. Jablon, “Joint blind equalization, carrier recovery, and timing recov¬
ery for high-order QAM signal constellations,” IEEE Trans, on Signal
Processing, vol. 40, no. 6, pp. 1383-1397, June 1992.
[11] U. Mengali and A. N. D’Andrea, Synchronization Techniques for Digital
Receivers , Plenum, NY, 1997.
[12] M. Moeneclaey and G. de Jonghe, “ML-oriented NDA carrier synchro¬
nization for general rotationally symmetric signal constellations,” IEEE
TYans. on Communications, vol. 42, no. 8, pp. 2531-2533, Aug. 1994.
[13] “Proofs of Theorems 1, 2,” http:/ / ee.tamu.edu/~serpedin.
Fig. 1. Standard Deviation vs. SNR a) Experimental Values b)
Asymptotic Values (256 square-QAM)
174
Fig. 2. Standard Deviation vs. SNR: Experimental/Theoretical Val¬
ues a) Power Estimator b) Reduced-Constellation Power Estimator
c) Chen etal. Estimator (256 square-QAM)
Fig. 5. Standard Deviation vs. Phase offset: Asymptotic Limit (256
square-QAM)
Fig. 6. Standard Deviation vs. SNR a) Power Estimator b) Reduced-
Constellation Power Estimator (128 cross-QAM)
Fig. 3. Standard Deviation vs. No. of Samples: Power Estimator vs.
Chen etal. Estimator (256 square-QAM) Fig. 7. Standard Deviation vs. SNR/Data: a) Reduced-Constellation
Power-Law and Power-Law Estimators b) Power Estimator (128
cross-QAM)
Fig. 4. Standard Deviation vs. SNR a) Chen etal. Estimator (0 =
7t/5) b) Asymptotic Limits (256 square-QAM)
Fig. 8. Standard Deviation vs. SNR: Exact and Approximate Asymp¬
totic Limits (256 square-QAM)
175
UNBIASED PARAMETER ESTIMATION FOR THE IDENTIFICATION OF
BILINEAR SYSTEMS
Souad MEDDEB, Jean Yves TOURNERET and Francis CASTANIE
ENSEEIHT /TESA, 2 Rue Camichel, 31071 , Toulouse, France
e-mail: meddeb@len7.enseeiht.fr
ABSTRACT
This paper addresses the problem of time-invarying
(TIV) bilinear system identification. The input-output
relation of a TIV bilinear system is expressed as a time-
varying recursive equation. Such formulation allows us
to estimate the unknown bilinear system parameters
using a modified least-squares (MLS) algorithm. The
MLS method provides unbiased estimates of the un¬
known bilinear parameters. Several simulations illus¬
trate the MLS estimator performance.
1. INTRODUCTION
Linear models have found a variety of applications in
many areas such as speech processing, image process¬
ing and communications. These models include para¬
metric Autoregressive (AR), Moving Average (MA) or
Autoregressive Moving Average (ARMA) models. The
use of these parametric models can be motivated by
the following property: for any real-valued stationary
process y ( n ) with continuous spectral density S (/), it
is possible to find an ARMA process whose spectral
density is arbitrarily close to S (/) ([2], p. 130). How¬
ever, these models fail to identify many systems which
are inherently nonlinear.
Bilinear model has been used successfully to approx¬
imate a large class of nonlinear systems [5] [7], Its abil¬
ity to represent many nonlinearities efficiently and with
a relatively small number of parameters is owing to its
feedback structure [5]. Other properties motivating the
use of bilinear systems are also discussed in [4], The
problem of estimating bilinear system parameters using
measurements of the system input and output signals
has received much attention in the literature [3] [6]. Re¬
cursive estimation algorithms including the recursive
least squares algorithm (RLS) or the extended least
squares algorithm (ELS) have been studied in [3]. The
main advantage of the RLS algorithm is its simplicity
because of the linearity in the parameters. However,
the algorithm provides biased estimates. Simulations
presented in [3] have shown that the ELS algorithm
outperforms the RLS algorithm in terms of bias. How¬
ever, no theoretical study was provided because of the
non-linear estimation problem and the difficult com¬
putation required. Hence various methods have been
devised to obtain unbiased estimators from linear esti¬
mation problems. Some of these methods are based on
modifying the least squares estimator by substracting
the bias from the estimates [8]. This paper studies the
modified least squares (MLS) algorithm for the identi¬
fication of bilinear systems. The MLS algorithm yields
unbiased parameter estimates and lower computational
cost than the ELS algorithm.
The paper is organized as follows. Section II presents
the problem. Section III studies the recursive MLS al¬
gorithm for the bilinear system identification problem.
Simulation results and conclusions are reported in sec¬
tion IV.
2. PROBLEM FORMULATION
The output x(t) of a bilinear system driven by the input
sequence u(t) can be defined by the following recursive
equation :
p p
x(t) — ajx(t — i) + bju(t — i)
7=1 7=1
P V
+ J2 - j)x(t - i) (1)
7=1 j~ 1
where aj,6,,Cij are the unknown bilinear system pa¬
rameters and t = 1, ..., N. A noisy version of x(t) de¬
noted
y(t) = x(t) + e(t) (2)
is observed (see fig. 1). In eq. (2), e{t) is a stationary
white Gaussian noise with zero mean and variance
E[e{t)e(s)\ = a26t,s
where 6t,s is the kronecker symbol. Eq.’s (1) and (2)
show that the observed process y(t) satisfies the follow-
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
176
ing time- varying (TV) model :
p
y(t ) = ao(t) + - 0 + eW (3)
i— 1
where the TV parameters are
p
a0(t)
— y^bju(t-i)
i— 1
(4)
&i(t)
V
= 0>i “h ^ ^ j) J ^
3= 1
...,p
In eq. (3), e(t ) is a colored noise sequence defined by:
p
e(t) = e(t) - ^2,ai{t)e(t - i),t = 1, N
i= 1
Model (3) is similar to the TV ARMA model studied in
[1] for the identification of non-stationary signals em¬
bedded in noise. Indeed, dj(f) can be viewed as a linear
combination of functions fj(t) as follows:
p
Cli(t) — ^ i — 0, ...,p
(5)
j= 0
«00
= 0, aoj = bj, j 1, ...,p
(6)
®i0
— — Cij , j 1 , ■ • -P
fo(t)
= 1 ,fj(t) = u(t-j),j = l = l,.
..,p
Eq. (5) is similar to the decomposition of the time-
varying AR parameters onto a set of basis time func¬
tions studied in [1]. This paper proposes to estimate
the unknown bilinear system parameters from the in¬
put and output samples u(t) and y{t) for t = l, N
using the modified least squares (MLS) algorithm [1],
[8],
3. LEAST-SQUARES ESTIMATORS
Denote dT = (bT,dJ) the bilinear system parameter
vector with bT = (b\, ... , bp) and
9 1 — (al ! CU, C 1,2) • • • ) Cl }p, Cl2) c2,l) ■ • ■ ) api ■ • • 1 Cp,p)
(7)
Eq. (3) can be written in matrix form as follows:
y(t) =yf_10 + e(t), t = l,...,N (8)
where
y£-i = (u(t-l),u(t-2),...,u(t-p),
y(t - 1), y{t - l)u(f - 1 ),..., y(t- 1 )u(t - p),
' • • 1
y(t - p),y(t - p)u(t - 1 ),..., y{t- p)u(t - p))
3.1. The Conventional LS Algorithm
The conventional least squares (LS) estimator of 9 de¬
noted 6 isr, is defined by
9m =argmin J\{9) (9)
e
where Ji(9) = I Zt=ie2^)- Since Ji{9) is linear w.r.t.
9, an analytical solution for 9 can be derived:
(N \ -1 N
5Zyt-i2/(i)
t=i / t=i
The white noise sequence e(t) being zero-mean and
decorrelated with x(t), lim On can be expressed as
N— >oo
a function of the true parameter vector as follows :
(10)
where 0P|P is the p x p zero matrix,
Pn = feyt"iy£-i j
and
Ut = (1, u(t - 1), . . . , u(t - p))T (1, u(t - 1), . . . , u(t - p)) .
Eq. (10) shows that the LS estimator of 6 is generally
asymptotically biased.
3.2. The Extended LS Algorithm
The Extended Least Squares (ELS) algorithm has
shown interesting properties for pseudo-linear regres¬
sion models such as (8) [3]. This algorithm can be
summarized as follows:
Qn = 1 + y^PNyN,
Pn+ i = Pn ~ RNyNQjvVw-fW,
&N+1 = 9n + RiVyjvQjv1 (yw+l — yV^iv))
y(N + 1) = yJfON+i,
Vn+i = («(*0, . . . ,y(N), . . . ,y(N+l-p)u{N+l-p)).
It is well known that the ELS algorithm provides unbi-
ased estimates. However, it suffers from stability prob¬
lems [3]. Next section studies another unbiased estima¬
tor known as Modified Least Squares (MLS) estimator.
177
3.3. The Modified LS Estimator
The MLS estimator also denoted bias-compensated
least squares estimator is defined as follows [8]:
6n = On + cr2 PnVnOi n-i (11)
The MLS estimator defined by eq. (11) is clearly
asymptotically unbiased. However, this estimator re¬
quires to compute the sum of N matrices of size
(p2+2p) x (p2+p). In order to avoid such computation,
we assume in the following that the input sequence u(t)
is a sequence of mutually independent and identically
distributed (i.i.d) random variables with zero-mean and
variance <j2 = 1. In this case, the following result can
be obtained:
lim —VN =
N—> co N
= V (13)
The following biased compensated LS estimator can
then be defined:
On — On + c t2NPnVOi,n-i (14)
Eq. (14) explicitely depends on the noise variance a2.
Next section studies a recursive algorithm for the joint
estimation of cr2 and 6 as in [8].
4. NOISE VARIANCE ESTIMATION FOR
THE MLS ALGORITHM
Denote £t(7V) the residual at time N and Rn the sum
of residual squares:
UN)
y(t) - yf-^N
Rn = Y;&(N) (16)
t= 1
Eq. (8) shows that the residual £t(N) can be written
UN) =yf-i V-ON) + e(t)-'Z_101
where
It is well known that On satisifes the normal equations
(obtained by differentiating J\ (6) with respect to 0)
[8]:
N
5>-i6(j>0 = o
t= i
Consequently
A )UN)
hence
^r-Rjv = a2 + o2E K] V9i
By replacing the expectation E and E 0N by
their instantaneous values, an estimator of the noise
variance can be defined:
c.2 _ 1 Rn ,10,
aN ~ AT ~ (18)
1 + 0 N V0\tN—l
The MLS algorithm for the joint estimation of the noise
variance cr 2 and the bilinear system parameter vector
6 is then based on the following recursive equations:
ef-i = (e(t-l),e(t-l)u(t-l),...,e(t-l)u(t-p),
e(t - p), e(t - p)u(t - 1 ),..., e(f - p)u(t - p))
Qn = 1 + yw-Pivyjv, (19a)
Rn+i = Pn ~ RjvyArQjv’y^-fV, (19b)
On+i = On + PNyNQj/iyN+i - ylfOu), (19c)
Rn+i — Rn + £n+i(N + l)^^1, (19d)
-2 _ 1 Rn+i \
',+' ‘ ’
0N+1 = 0N+1 + (N + l))Z2N+1PN+1V9h^m)
Note that eq.’s (19a), (19b) and (19c) are the classical
RLS equations [3]. Eq.’s (19d), (19e) and (19f) en¬
sure that the bilinear system parameter estimates are
asymptotically unbiased. It is interesting to note that
the MLS algorithm does not require any matrix inver-
5. SIMULATION RESULTS
Many simulations have been performed to illustrate the
previous theoretical results. For this experiment, con¬
sider the following second-order bilinear system [3]
x(t) = 1.5x(t - 1) - 0.7x(t - 2) + u(t - 1)
+0.5u(t — 2) + 0.12a;(t — 1 )u(t - 1)
The observed driving sequence u(t) is white Gaussian
with variance 1. The bilinear signal x(t) is contami¬
nated by white Gaussian noise with signal-to-noise ra¬
tios (SNR’s) ranging from 5 to 40dB. The algorithm is
178
initialized with 0 = 0 and .P/v = 1/SI where 6 « 1.
Fig. 2 shows the convergence of the noise variance
estimate to its true value ( SNR = 5 dB or equiva¬
lently a2 = 7.28) from 10 Monte-Carlo simulations.
The mean square errors (MSE’s) of the bilinear system
estimates using RLS, ELS and MLS algorithms com¬
puted from 10 Monte-Carlo simulations are depicted in
fig. 3 as a function of the SNR for N = 4000. The
MLS estimator clearly outperforms the usual RLS es¬
timator in terms of MSE. Fig. 3 also shows that the
MLS estimator outperforms the ELS estimator for low
SNR’s. Tables 2 and 3 show the bias of RLS, ELS and
MLS estimates for two values of SNR. As expected, the
MLS estimator outperforms the usual RLS estimator in
terms of bias. The MLS and ELS algorithms perform
very similarly in term of bias.
6. APPLICATION : NON LINEAR
SATELLITE CHANNEL IDENTIFICATION
Several non linear techniques have been proposed for
modeling non linear channels with memory. These
techniques include Volterra series, wavelet networks
and neural networks [11]. The use of Volterra series to
model satellite channels was motivated in [9] and [10].
These Volterra models suffer from the number of pa¬
rameters that increases exponentially with the memory
and nonlinearity order. It is well known that the bilin¬
ear model can be decomposed in a Volterra series with
a reduced number of parameters [4]. Consequently, this
paper propose 1) to model the non linear satellite chan¬
nel using the bilinear model and 2) to identify such non
linear model using the LS procedures described in pre¬
vious sections. A simplified satellite channel consists
of two earth stations connected by a satellite repeater
as depicted in fig. 4 (see [11] for more details including
channel characteristics). As an example, Fig. 5 shows
the normalized prediction error between the outputs
of the noisy simplified satellite channel and the corre¬
sponding bilinear system computed using MLS algo¬
rithm.
7. CONCLUSION
The new contribution of this paper is to derive a mod¬
ified least squares algorithm, from the theory of lin¬
ear time-varying models for the identification of time
invarying bilinear models. A recursive version of the
modified least squares algorithm is derived as well. The
algorithm provides estimates of the noise variance and
bilinear model parameters. Bilinear MLS parameter es¬
timates are shown to be asymptotically unbiased. The
MLS estimator performance is compared to that of the
RLS and ELS estimators. The MLS estimator is finally
applied to the identification of the non linear satellite
channels.
8. REFERENCES
[1] G. Alengrin, M. Barlaud and J. Menez, “Unbiased
Parameter Estimation of Nonstationary Signals in
Noise,” IEEE trans. on ASSP, vol. 34, n°5, pp. 1319-
1322, oct. 1986.
[2] P. J. Brockwell and R.A. Davis, Time Series: Theory
and Methods, Springer Verlag, 1990.
[3] F. Fnaiech and L. Ljung, “Recursive Identification of
Bilinear Systems,” Int. J. Control, Vol. 45, No. 2, pp.
453-470, 1987.
[4] D. Guegan, “Serie Chronologiques Non Lineaires a
Temps Discret”, Statistique Mathematique et Prob¬
ability, Economica.
[5] V. John Mathews, “Adaptive Polynomial Filters,”
IEEE SP Magazine, pp. 10-26, July 1991.
[6] S. Meddeb, J. Y. Tourneret and F. Castanie, “Identifi¬
cation of Bilinear Systems Using Bayesian Inference”,
Proc. of ICASSP, pp. 1609-1612, Seattle, USA, May
12-15, 1998.
[7] R. R. Mohler and W. J. Kolodziej, “An over view of
bilinear system theory and applications,” IEEE Trans¬
action on Systems, Man, and Cybernetics, Vol. SMC-
10, pp. 683-688, 1982.
[8] S. Sagara and K. Wada, “On-line modified least-
squares parameter estimation of linear discrete dy¬
namic systems,” Int. Jour, of Cont., Vol. 25, no. 3,
pp. 329-343, 1977.
[9] S. Benedetto, E. Biglieri and R. Daffara, “Modelling
and performance evaluation of nonlinear satellite links
- A Volterra series approach,” IEEE Trans. AES, vol.
15, pp. 494-506, July 1979.
[10] S. Meddeb and J. Y . Tourneret, “Identification of
Non-linear Satellite mobile channels using Volterra Fil¬
ters,” in proc. EUSIPCO, Tampere (Finland), septem-
bre, 2000.
[11] M. Ibnkahla, N. J. Bershad, J. Sombrin and F. Cas¬
tanie, ’’Neural networks modelling and identification
of nonlinear channels with memory: Algorithms, ap¬
plications and analytic models,” IEEE Trans. SP, vol.
46, no. 5, May 1998.
179
BLIND IDENTIFICATION OF LINEAR-QUADRATIC CHANNELS WITH
USUAL COMMUNICATION INPUTS
Nicolas PETROCHILOS1'2
Pierre COMON 2
(1) CAS, Dept EE, Delft Univ. of Technology
Mekelweg 4, 2628 CD Delft, The Netherlands
petroOieee.org
ABSTRACT
This article presents a method to blindly identify linear
quadratic channels (LQC). The method is designed for
the single-input/single-output (SISO) case with white
inputs with specific distributions (as those usually
found in digital communications). Using High-Order
Statistics (HOS) of the input, the method is able to
match the third-order moments with the LQC model,
yielding an original simple relation. Several simula¬
tions are performed and show a fair accuracy given
sufficiently long observation records.
1. INTRODUCTION
Nonlinear systems provide a better approximation to
real life channels, and many examples of nonlinearities
can be found in nonlinear control systems [5] , hydrody¬
namics [4], satellite communication systems [1], or un¬
derwater acoustics, among others. Blind methods are
attractive when the input is unknown, and to avoid the
reduction of the information rate caused by the inser¬
tion of training sequences.
Blind identification of Volterra systems has been al¬
ready widely studied in the past. For instance, in [7],
the authors derive the cumulant-matching equations,
allowing to blindly identify a pure real quadratic sys¬
tem, with i.i.d. inputs of unknown distribution. Next
in [2], P.Bondon goes much further, and derives identi-
fiability conditions, when two input sequences are ob¬
served, one Gaussian and one non Gaussian.
In this paper, we focus our attention on linear-
quadratic systems, with specific discrete inputs, en¬
countered in n— PSK and QAM digital modulations.
So this contribution differs from the previous ones in
two respects: the system is not purely quadratic, and
This work was partly supported by ENS Lyon, ENSEA, TU-
Delft, and the RNRT project “Paestum”. The first author thanks
A. Trindade, A. Heldring, S. Halford, and A. Elmilady for their
moral support, E. Serpedin for useful discussions, and G. Gian-
nakis for having attracted his attention on the non-linear blind
identification problem.
(2) I3S, Algorithmes-Euclide-B
2000 route des Lucioles, BP 121
F-06903 Sophia-Antipolis cedex, France
comonOuxiice.fr
the inputs are imposed to be discrete and of known
distribution. The scope is thus less general.
2. MODEL FORMULATION
The problem is modeled here by the parameterization
of the channel and by the statistics of the inputs.
2.1. Volterra kernel model
The model is described by the noisy output of a nonlin¬
ear system moving average Volterra model (which can
be of any order). Sampling at a rate Ts and restrict¬
ing to the Linear-Quadratic case, the channel can be
modeled as:
Li
y(n) - ^2 (0 ® (n - 0 + v M
(=0
l2
+ h2 (*’•?) * (n - *) x (n - i) (!)
i,j~ 0
where x(n) is the input signal, v(n) denotes the ad¬
ditive noise, and h„ is called the nt/l-order Volterra
non-linear operator (here, we only have the linear and
the quadratic term: and ^2 Ui , ^2)) ■ Without loss
of generality, we consider that hn is symmetric in its
arguments [6, pp. 80-81].
2.2. Usual communication inputs
For the sake of convenience, denote:
eab = E [xaXb*] .
In this article, we consider inputs commonly used in
digital communications, sharing the high-order proper¬
ties:
£21 = £31 = £32 = £41 = £42 = 0 (2)
Among these inputs, two groups have been identified
(see [9] and [3]):
0-7803-5988-7/00/$10.00 © 2000 IEEE
181
• Distributions that are symmetric about both axes
in the complex plane: p(z) = /(Sft{z}) • </($>{z}).
Corresponding random variables can be rewrit¬
ten as z = s + je' , where s and s' are real, in¬
dependent, and symmetrically distributed, and
j2 d= —1. QAM constellations, in digital com¬
munications, belong to this class.
• Discrete distributions that are invariant by a ro¬
tation of an angle of the form ^ , (K G N).
QPSK, double QPSK, and any n-PSK are in¬
cluded in this class as soon that n > 4.
we get the matrix formulation:
C12Y = At B A (6)
where A is the (L2 + 1) x (Li + L2 + 1) Upper Tri¬
angular Band (UTB) Toplitz matrix containing hi =
[h(0) . . . h(L\)] in the first row and zeros elsewhere:
3. CHANNEL IDENTIFICATION
First, the basis of the identification process is pre¬
sented, then the algorithms are derived, and a proof
of uniqueness is eventually given.
3.1. Moment-matching relations
Consider the following assumptions:
(AS1) The channel is Linear-Quadratic of finite known
length.
(AS2) The input is stationary independent identically
distributed (i.i.d.), and must comply with the
properties (2); <r2 = eu and /i4x = £22 are also
assumed to be known.
(AS3) The noise is signal-independent white Gaussian.
Let us now define the complex bicorrelation as:
Cny{lt k) d= E {y* ( n)y(n + l)y(n + k)} (3)
Under assumptions (AS1-AS3), the bicorrelation
of the output (3) and the channel model (1) should
match, which gives the following relations:
Ci2y{l, k) = H Hi + 0*1 U + *)*2 (*’, j) (4)
with (/, k) G [—L2.i1] x [— L2,Li], and where'
=f [2e?1+i(i-J0(£23-2cf1)] h*2(i,j). The
Z-transform of C\2(l, k) gives in the [Z\ , Z2) domain:
while B is symmetric complex and contains the values
of the kernel h \ :
B d=
k (£3, £3)
k (£2,0)
*5(0, £2) '
*5(o,o) .
We propose to identify the channel coefficients by
using either relation (5), or (6) with the estimate
Ui2y (l, k) d= i J2n=i y* («) y{n + l)y{n + k).
One can notice that C12Y is a (Li +L2 + 1) square
matrix of rank (L2 + 1). This observation allows to
detect the length of the channels (hi, h2) from an es¬
timate C12Y of C12Y.
3.2. Proposed algorithms
We propose several algorithms: (i) a Root-Finding
method (RF), (ii) a Sub-Space Intersection method
(SSI), (iii) a method that forces the row span to have
certain triangular properties (UTB), and (iv) an itera¬
tive Multidimensional Search method (MS).
(i) One can give several values to Z2 in (5), and
get several functions of Z\\ (Z 1). These functions
F»a (Zi) share the roots of Hi(Zi): ri, which are de¬
tected by clustering. The channel h\ (n)/hi(0) is the in¬
verse Z-transform of nf=i ~ r»). and one can build
A. Denoting A- the Moore-Penrose pseudo-inverse of
A, h2 is recovered via the “deconvolution”:
B = At“ • C12Y • A- . (7)
Sny{ZuZi) = HX (ZX) Hx (Z2) H*2 (±, (5)
Equations (4) or (5) form the core of the algorithms
subsequently proposed. By stacking the elements of
Ci2y{l,k) in a matrix C12Y as:
Cl2y (— L2, — L2) Cl2y ( — L2, Ll)
C12Y d=
Ul2y (Ll, — L2) ••• C12y {Lx, Lx)
(ii) Alternatively, one can factorize the matrix
C12Y in order to recover the vector hi in a similar
fashion as in [8]. In the noiseless case, given that B
has no null eigenvalue, the matrix model (6) implies
clearly that:
row(A) = row(C12Y)
col(AT) = col(C12Y) W
Considering the singular value decomposition (SVD) of
the symmetric complex matrix C12Y = VT . S . V, we
182
define V as the L2 + 1 first rows of V, associated with
the 1,2 + 1 dominant singular values. Let be the
L2 + 1 x Li + 1 submatrix extracted from V that gathers
the columns i to L\ + i. Then the conditions (8) are
restated as: hi G VW, Vi G [1, .. , L2+ 1]. Thus can
be obtained by computing the dominant right singular
vector of the matrix V containing all V(’) stacked one
above the other:
' y(i) '
V = :
Then the matrix B can be estimated afterwards by the
“deconvolution” procedure (7).
(iii) Another technique consists of forcing the UTB
structure of A beforehand by combining the rows of
matrix V; this is possible because of Lemma 1. Then,
one extracts the L2 + 1 dimensional row vectors v^i
contained in the UTB matrix TV, and stacks them in
a matrix V. The rest of the procedure is identical to
the previous approach (ii) .
(iv) Lastly, one can perform an iterative search in
the (Li + L2(L2 + l)/2) dimensional space of the matrix
product of (6) in order to find the parameters 0 (hi , h2)
that minimize the error in the sense of the Frobenius
norm:
e(hi,h2) = argmin ||C12Y - [At ■ B ■ A] (0)||^
3.3. Uniqueness
Lemma 1 Let N and P be two positive integers. Un¬
der certain regularity conditions, any N x (N + P)
rectangular matrix M can be put in UTB form by pre¬
multiplication by a square invertible matrix T. The
matrix T is unique up to an invertible diagonal multi¬
plicative matrix.
Proof: The constructive algorithm is very similar to
Gaussian elimination. Assume there are two matrices
Ti and T2 such that M = Ti Ui and M = T2U2,
where Ui and U2 are UTB. Then, considering the
N first columns of both sides shows that the matrix
TiTj 1 relates two Lower Triangular (LT) matrices,
and is thus LT itself. Similarly, considering the N last
columns shows that T1T2 1 is Upper Triangular (UT).
Thus, it is diagonal, which eventually shows that Ti
and T2 are related by a diagonal multiplicative ma¬
trix. □
Lemma 2 Any symmetric complex matrix C can be
factorized as C = LLT, where L is lower triangular.
Matrix L is unique up to the post- multiplication of a
diagonal matrix A formed of signs {+!}.
Proposition 3 If B is square full rank, and A is
UTB, then the decomposition of a complex symmetric
matrix C = AT B A is unique up to a multiplicative
diagonal matrix.
Proof: The proposition is a direct consequence of lem¬
mas 1 and 2. It is easily seen that if (A, B) is solution,
then so is (AA, A-1BA-1), where A is any diagonal
regular matrix. 0
Corollary 4 Let B be full rank symmetric complex
and A Toplitz UTB. When the decomposition of a sym¬
metric matrix as C12Y = AT ,B . A exists, then it is
unique up to a scalar multiplicative factor.
Proof: From proposition 3, if A is solution, then so is
AA, with A diagonal. But because A is Toplitz, AA
can be Toplitz only if A is proportional to the Identity
matrix. □
4. SIMULATIONS
In order to illustrate the Root-Finding (RF) method
step by step, we first present a typical example with
only RF and MS methods. Later we show a a more
exhaustive study with all the methods. Because we are
mainly interested in direct methods, the MS is given
only as a reference.
In all simulations, the input x was 4-PSK. We used
the real channel given by [9] (hi = [1, 0.5, -0.8, 1.6, 0.4]
and h2 — [1, 0.6; 0.6, —0.3]).
Typical example: The input is QPSK; the num¬
ber of samples is 16284 points; and the SNR is 10 dB.
Figure I. a. illustrates the clustering method. It shows
all the roots calculated for different Z2, the true roots,
and the ones estimated by the method, the estimated
roots (stars) are fairly accurate and match the real ones
(square). Figure I.b. shows the spectra of the real and
estimated linear channels. Both estimated spectra are
fairly accurate.
Computer comparisons: A first study showed
that the estimation noise of C\2y is rapidly predomi¬
nant over the additive noise contribution. As expected,
the Gaussian noise does not interact in the third-order
moment as soon as the length of integration is long
enough. So we mainly tried to estimate the influence
of the number of samples. For each number of samples
we took 1000 independent realizations, and the SNR
is 10 dB. For each realization, we estimated Ci2y, on
which we applied all the algorithms. Since in our case
C12Y is 6 x 6, the most computational intensive step
is its estimation for the direct methods. Due to its it¬
erative nature, up to several thousand of samples, the
most intensive step for the MS method is the multi¬
dimensional search.
Figure II presents the influence of the integration
length on the mean and variance of both estimates.
183
Figures II. a. and II. b. show that all methods con¬
verge to the true channel, the bias behaves well from
4096 points. The RF is the slowest method to con¬
verge to the expected value, while the MS is the fastest
to converge, The SSI and the UTB follow similar pat¬
terns.
Figures II.c. and Il.d. present the variances of both
methods. The variances follow approximately a linear
slope. It is difficult to decide which method behaves
the best. One can notice that the MS has stationary
performance after 64000 samples, this is because this
method was implemented in a too rustic way, and it
happens that a few times the MS algorithm is stuck in
local minima, thus degrading the quality of the stan¬
dard deviation. While not visible on the figure, the best
method varies for each element of hi, and generally
around 4096 samples the best method changes. Never¬
theless, above 4096 samples clearly the best method is
the SSI.
The variance shows well the usual problem with
High Order Statistics: in order to have consistent high-
order moment estimate, the integration length must be
long enough: a minimum of 8192 seems to be required
here.
5. CONCLUDING REMARKS
Several methods have been proposed to blindly iden¬
tify a linear-quadratic channel for communication ap¬
plications. The idea is to use the specificities of the
distribution of the inputs. The methods have shown
to converge with a good accuracy, with a rather large
number of samples.
4 i • • U I
REFERENCES
[1] S. BENEDETTO, E. BIGLIERI, V. CASTEL-
LANI, Digital Transmission Theory , Prentice-Hall
Inc., New Jersey, 1987.
[2] P. BONDON, M. KROB, “Blind identifiability of
quadratic stochastic system” , IEEE trans. on In¬
formation Theory , vol. 41, no. 1, pp. 245-254, Jan.
1998.
[3] N. PETROCHILOS, “Elements for blind identifi¬
cation of non-linear channels”, supervision by G.
Giannakis, Master’s thesis, ENSEA / ENS Lyon,
Sept. 1996, In archive of ENSEA, France.
[4] E. J. POWERS, S. IM et al., “Applications
of hos to nonlinear hydrodynamics”, in IEEE-
ATHOS Workshop on Higher-Order Statistics , Be-
gur, Spain, 12-14 June 1995, pp. 414-418.
[5] W. J. RUGH, Nonlinear System Theory , Johns
Hopkins Univ. Press, Baltimore, MD, 1981.
[6] M. SCHETZEN, The Volterra and Wiener Theo¬
ries of Nonlinear Systems, Wiley, New York, 1980.
[7] H-Z. TAN, Z-Y. MAO, “Blind identifiability of
quadratic non-linear systems in higher-order statis¬
tics domain” , Int. Jour. Adapt. Control Signal Pro¬
cessing, vol. 12, pp. 567-577, 1998.
[8] A. J. van der VEEN, S. TALWAR, A. PAULRAJ,
“A subspace approach to blind space-time signal
processing for wireless communication systems”,
IEEE trans. on Signal Processing, vol. 45, no. 1,
pp. 173-190, Jan. 1997.
[9] G.T. ZHOU, G.B. GIANNAKIS, “Nonlinear chan¬
nel identification and performance analysis with
PSK inputs” , in Proc. of 1st IEEE Signal Process¬
ing Workshop on Wireless Communications, Paris,
France, 16-18 April 1997, pp. 337-340.
Figure I: Example of Identification: 10 dB, 16284
points.
184
variance of H, mean of H
Evolution of mean of H1 in function of N
Evolution of variance of H1 in function of N
Evolution of mean of H2 in function of N
Evolution of variance of H2 in function of N
Figure II: Means and standard deviations for all methods with 1000 independent realizations. Simple line: RF,
UTB, d-: SSI, o-.MS.
JOINT CHANNEL ESTIMATION AND DETECTION FOR INTERFERENCE
CANCELLATION IN MULTI-CHANNEL SYSTEMS
Cristoff Martin and Bjorn Ottersten
The Department of Signals, Sensors & Systems
Royal Institute of Technology (KTH)
SE-100 44 Stockholm, Sweden
ABSTRACT
Interference from other users and interference due to mul¬
tipath propagation limit the capacity of wireless communi¬
cation networks. As the number of users and the demand
for new services in the networks increases, co-channel inter¬
ference will be a limiting factor.
This paper proposes an iterative structured multi¬
channel receiver algorithm that jointly estimates the com¬
munication channels and desired data while canceling inter¬
ference. A general way of adding training redundancy to a
data frame is also introduced.
Prom simulations the proposed method is shown to
achieve low bit error rates also in the presence of strong in¬
terference. These simulations also show that by distributing
the training information in a data burst elaborately, further
improvements in performance are achievable.
1. INTRODUCTION
During the last decades, a rapid development in mobile
communications has occurred. The seemingly ever increas¬
ing number of users and services has caused equally in¬
creasing demand for capacity and reliability. Because of
the physical limitations of radio communications and the
limited bandwidth available these demands are difficult to
meet.
One of the factors that limits capacity is the interfer¬
ence from other users, Co-Channel Interference or CCI. The
problem is further complicated by the fact that in realistic
wireless communication systems there will always be some
amount of multi-path propagation causing Inter-Symbol In¬
terference or ISI. Thus, by developing receivers that can
handle these kinds of interference, the capacity and relia¬
bility in the wireless network can be increased. One way of
combating interference is through the use of antenna arrays,
thus creating a multi-channel system. The receiver systems
considered in this paper are all multi-channel.
This paper considers an iterative algorithm that at the
same time it is rejecting interference also estimates trans¬
mitted data and baseband transmission channels. The pro¬
posed receiver is semi-blind, i.e., it uses training information
available for the desired user.
Several other approaches have been taken to reject in¬
terference. Iterative Least Squares with Projection (ILSP)
is introduced in [1, 2]. ILSP is a blind method to separate
several co-channel signals using the Finite Alphabet (FA)
property of digital communication signals. However it does
not handle ISI nor does it handle training information in
a natural fashion. The method presented herein is similar
to ILSP but taking ISI and training information into ac¬
count as well. In [3] an interference rejection algorithm is
presented that by using ILSP, oversampling and an extra
processing step is able to also handle ISI. Another method
similar to ILSP is proposed in [4], this method also handles
training sequences and ISI. However it does not handle the
structure imposed by the ISI. Another class of interference
rejection algorithms are subspace methods. These use al¬
gebraic subspace properties to reject interference based on
second order statistics. An example of such a method used
for comparison in this paper can be found in [5].
2. DATA MODEL
An L element antenna, with symbol spaced base band sam¬
pling is considered. For simplicity only one desired user and
one interfering user is considered (even though the data
model and proposed receiver algorithm easily can be ex¬
tended to multiple users and interferers). The interferer is
assumed to be using the same modulation scheme as, and be
burst synchronized with, the desired user. Within a burst
the user and the interferers send one data frame consisting
of N symbols of which Nd symbols are unknown data and
the rest are used for training purposes. The radio chan¬
nels between the transmitters and the receiving antennas
are assumed to be time invariant within one data frame. It
is also assumed that the transmission process between the
transmitter and the receiver, including the effects of the
transmitter and receiver filters can be modeled as a FIR
filter of length M. It is then possible to model the received
data as
X = HS + GD + V. (1)
Where X (which is L x (N + M - 1)) contains the data re¬
ceived by the antenna array. The channel matrices, H and
G (both L x M), describe the transmission process between
the desired user and the interferer respectively. The trans¬
mitted data is contained in S and D ( M x (N + M — 1))
while V models additive noise. The received data matrix
X is organized as X = [*(1) *(2) ... x(N + M — 1)]
where x(n) is a column vector containing the the data out¬
put from the array at the nth sampling instant. To exem¬
plify the organization of the data matrices, the data matrix
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
186
of the desired user is
4. PROPOSED ALGORITHM - OUTLINE
- F -
1 0
0 j
S =
(2)
Where s is a vector containing the data symbols transmit¬
ted in one frame. Prom (2) the structure of the data matri¬
ces becomes obvious. In order to achieve good performance
a receiver algorithm must preserve this structure.
3. PROBLEM FORMULATION
The problem of estimating the unknown data vectors and
channel matrices is considered. It is assumed that training
information is available for the desired user while it is un¬
known for the interferer. The transmission of the data is
disturbed by spatially and temporally additive white com¬
plex Gaussian noise.
The goal is to find the maximum likelihood estimates
of H, S, G and D. That is, the H, S, G and D that
minimizes
\\X -HS-GD\\l (3)
taking the finite alphabet property of the signals into ac¬
count. Note that given the data symbols, the criterion
is quadratic in the channel matrices. After rewriting this
norm as
\\X-HS-GD\\l =
X — [H G]
(4)
it can be minimized with respect to [H G] ,
H G\ - X
S'
D
t
= X
S'
D
'(
S'
D
S'
D
r
(5)
The algorithm proposed in this paper takes an iterative ap¬
proach to minimize (3) while maintaining the structure of
the data matrices (see (2)). Known training information is
also taken into account. The iterative procedure of the pro¬
posed algorithm is similar to the ILSP algorithm proposed
in [1, 2],
Assuming that initial estimates of the data sequences
are available the method can be outlined as
1. Assume that the estimated data sequences are cor¬
rect. The norm (3) is now quadratic in H and G
and it is easy to estimate the channel matrices.
2. Rewrite the norm (3) so that it can be minimized in
a way that maintains the structure of S and D and
takes available training information into account.
3. Now, assume that the estimated channel matrices are
correct. The norm (3) becomes quadratic in S and
D if we relax the FA-property. Thus, it is possible
to estimate the unknown data symbols by solving a
linear set of equations.
4. Project the data on its finite alphabet.
5. Repeat the steps above until convergence.
If the initial data estimates are good enough the method
will in general converge to the desired global minimum of (3)
and the initial data estimate are improved.
The proposed method also makes it possible to gen¬
eralize how the training information is added to the data
sequence. This is considered in the following section. A
more detailed description of the algorithm can be found in
section 6.
5. GENERALIZED TRAINING USING CODE
MATRICES
When a training sequence is added to a data frame it is usu¬
ally either simply inserted in the beginning or at the middle
of the data frame. Here a more general way of adding the
training data is introduced by the affine mapping
s = C\Sd + Co- (7)
Where A * denotes the pseudo inverse of A. After having
resubstituted H and G into (4) a minimization criterion
only depending on S and D is achieved,
min
S,D
2
F
(6)
Where P Jj = I — A * (A A*)-1 A and I is the identity ma¬
trix. It is now possible to find the global minimum by
enumerating over all possible S and D using their FA-
property and known training information while maintaining
the structure of the matrices. The enumerating however is
of exponential complexity which makes this enumerating
impossible also for modest data frame sizes. The follow¬
ing sections consider a suboptimal method that attempts
to minimize (3) with less computational complexity.
Where s (N x 1) contains the data to be transmitted (data
and training information), Sd ( Nd x 1) contains the data
without training information. C i ( N x No) and Co ( N x
1) are Code Matrices that add training information (and
possibly error correcting redundancy) to the data.
It is obvious that the code matrices can be chosen so
that training information is added to the data sequence in
the conventional way described above. However this also
provides the opportunity of adding training information
more elaborately. For example the training information can
be distributed over the entire data sequence.
6. PROPOSED ALGORITHM - DETAILS
The steps of the proposed algorithm outlined in section 4
are presented in more detail in this section. It is assumed
that an initial estimate of the unknown user data and the
187
interferer data is present. Further it is assumed that the
code matrices Co and C i are known for the desired while
they are not available for the interferer.
6.1. Estimating the Channel Matrices
If we assume the estimated data sequences to be correct a
least squares estimate of the channel matrices can be found
as (see (5))
[H G] = X
6.2. Maintaining the Structure of the Data Matri¬
ces
To maintain the structure of the data matrices while es¬
timating them the norm (3) must be rewritten. This can
be achieved using properties of the vec operator and the
Kronecker product. Letting vec denote the vec operator,
8 denote the Kronecker product and I denote the identity
matrix, this rewriting can be done in a few steps as follows,
vec {X - HS - GD} = vec X - (I ® H ) vec S
-(/®G)vecD. '8'
To simplify notation, let 4>h = J ® H , = I ® G, and
x = vecX. Also, the ( NM x N) selection matrix \P is
defined. The matrix tP consists of zeros and ones and takes
a data vector to a vectorized data matrix, i.e. vec S = fPs
and vec D = <Pd. Now, (8) can be rewritten as
vec {X - HS - GD} =x-$H*s-
= X - ^H^CiSd
- $H^Co -
= x — tFCo
(9)
where the middle step follows from (7). By using (9) the
norm (3) can now be minimized with respect to the data
while maintaining the structure of the data matrices S, D.
6.3. Estimating the Received Data
By using (9) and assuming the estimated channels to be
correct we now obtain continuous estimates of the unknown
data vectors Sd and d. This can be done much in the same
way as the estimation of the channel matrices which results
in
s_d
d
(x - ^H^Co) •
(10)
The unknown data can be estimated by projecting the con¬
tinuous data estimates to the finite alphabet in use.
Finally the three steps above are iterated until conver¬
gence is reached. If the initial estimates are good enough
they are in general improved.
7. PRELIMINARY RESULTS
To give some insight to the kind of performance that the
proposed algorithm might offer, simulations have been con¬
ducted and the results from these are presented in this sec¬
tion. In order to offer some comparison with previous work,
the structured subspace receiver described in [5] was sim¬
ulated under the same conditions and results from these
simulations are provided.
Two different sets of code matrices were used (see sec¬
tion 5). One conventional with all the training symbols in
the beginning of the sequence and one with the training
symbols spread over the entire sequence. In the simula¬
tions of the structured subspace receiver the entire training
sequence was located in the beginning of the data frame.
In all cases an L = 4 antenna system was considered.
An antipodal binary modulation scheme was employed (this
would for example correspond to BPSK).
To model the transmission process (the transmit¬
ter/receiver filters and the radio channel) a two tap FIR
channel model was used. The channels were assumed inde¬
pendent from antenna to antenna and to simulate Rayleigh
fading the channel taps were independently drawn from a
complex Gaussian distribution.
In the simulations it was assumed that the length of the
channel impulse responses, M, and the number of transmit¬
ters, U, are known or have been correctly estimated.
To offer some idea about what the achievable perfor¬
mance would be, a simple initialization scheme was em¬
ployed. Interferer data was initialized with its continuous
solution (of the minimization of the norm (4), ignoring the
structure of the data matrices, see e.g [2]) projected to the
finite alphabet in use. The desired user data was initial¬
ized with random data symbols. Received sequences where
the resulting norm (3) was smaller than the true norm (the
norm (3) achieved using the true data and channel matrices)
plus one standard deviation of the norm were kept while re¬
ceived sequences not fulfilling this criteria were identified as
outliers.
In figure 1 the bit error rate performance of the proposed
method as a function of the Signal to Noise Ratio (SNR)
is shown. The desired user is disturbed by a single inter¬
ferer. The Signal to Interference Ratio (SIR) in these sim¬
ulations was —10 dB. The results from the simulated pro¬
posed method are compared with the structured subspace
method with estimated channels and with known channels.
Also, the two different sets of training matrices (described
above) are compared. The data frames consist of 57 sym¬
bols of which 42 are data symbols and the rest are used
for training purposes. At these conditions the proposed
method performs well on par with the structured subspace
method using perfect channel estimates. The structured
subspace method by itself needs longer training sequences
in order to perform well (see figure 4). The distributed
training information offers slightly better performance than
the conventional training sequence. Even though the differ¬
ence in performance is small this is interesting as both these
data distributions use the same number of training and data
bits. Only how they are distributed differ.
To explore the loss in performance due to the interfer¬
ence, the proposed method was simulated with and with-
188
Signal to Noise Ratio (in dB)
Figure 1: Performance with a single -10 dB interferer
present.
out an interferer. Other than that the simulated condi¬
tions were identical to the previous simulation. The re¬
sults from these simulations are shown in figure 2. As can
be seen from the graph, at an SNR of 4 dB the loss is
approximately 1.5 dB, both with the conventional train¬
ing sequence and with the distributed training information.
Again slightly lower bit error rates were achieved when the
distributed training information was used compared to the
more conventional training data distribution.
The number of data frames not converging to a norm
small enough, the rejection rate, was also measured under
the same conditions as in the previous simulations. Figure 3
shows the results from these measurements. As can be seen
from the graph, when there is CCI present the rejection
rate becomes quite high and it would be desirable to use a
better initialization method.
The effects of the length of the training sequence was
also given some attention. Once again the proposed al¬
gorithm with the two different training distributions and
the structured subspace method (found in [5]) were com¬
pared. Figure 4 shows the bit error rate of the desired data
sequence as a function of the number of training symbols
and figure 5 shows the rejection rate as a function of the
number of training symbols. These simulations were per¬
formed at an SNR of 4 dB, with and without a single -10
dB co-channel interferer. The number of data symbols in
each frame remained 42. From figure 4 it can also be seen
that the proposed method is less sensitive to short training
sequences than the method used for comparison. Figure 5
shows that the number of rejected sequences increases fast
when the number of training symbols drops below 15. It
seems likely that the convergence criteria might affect sim¬
ulated bit error rates when the number of training symbols
becomes smaller than that.
As can be seen from the results above the proposed
method is showing promising performance. However there
are still several issues that require further investigation. For
example, in its current implementation the proposed re¬
ceiver algorithm is computationally expensive. Also the ro-
Signal to Noise Ratio (in dB)
Figure 2: Performance lost due to interference.
Signal to Noise Ratio (in dB)
Figure 3: Rejection rates as functions of the SNR.
Number of training symbols
Figure 4: Error rates at an SNR of 4 dB.
189
Figure 5: Rejection rates at an SNR of 4 dB.
[5] G. Klang and B. Ottersten, “Channel estimation and in¬
terference rejection for multichannel systems,” in Pro¬
ceedings of the 32th Asilomar Conference on Signals,
Systems and Computers, (Pacific Grove, CA, USA), nov
1998.
bustness to model errors and initialization are other issues
that deserve more attention. More general forms of train¬
ing information where the data is confined to more general
affine mappings can easily be considered with the proposed
method.
8. CONCLUSIONS
Herein, we have presented a interference cancellation
method that can be applied to multi-channel data. Train¬
ing information from the desired user is exploited and the
communication channels are jointly estimated together with
the unknown data symbols of both the desired user and the
interference. This method can easily treat general forms of
training information and a simple example with distributed
training information was shown to give improved perfor¬
mance compared to a block of training data.
9. REFERENCES
[1] S. Talwar, M. Viberg, and A. Paulraj, “Blind estimation
of multiple co-channel digital signals using an antenna
array,” IEEE Signal Processing Letters, vol. 1, February
1994.
[2] S. Talwar, M. Viberg, and A. Paulraj, “Blind separa¬
tion of synchronous co-channel digital signals using an
antenna array - part I: Algorithms,” IEEE Transactions
on Signal Processing, vol. 44, pp. 1184-1197, May 1996.
[3] A.-J. van der Veen, S. Talwar, and A. Paulraj, “Blind
identification of FIR channels carrying multiple finite
alphabet signals,” in Proc. of ICASSP, vol. 2, pp. 1213-
1216, 1995.
[4] J. Laurila, R. Tschofen, and E. Bonek, “Semi-blind
space-time estimation of co-channel signals using least
squares projections,” in Proceedings of the Vehicular
Technology Conference, 1999. VTC 1999 - Fall., vol. 3,
pp. 1310 - 1315, Sept 1999.
190
A SPATIAL CLUSTERING SCHEME FOR DOWNLINK BEAMFORMING IN
SDMA MOBILE RADIO
Wen-Jye Huang and John F. Doherty
Department of Electrical Engineering
The Pennsylvania State University
University Park, PA 16802
E-mail: {wxhl48,jfdoherty} @psu. edu
ABSTRACT
In this paper we proposed a new approach that clusters
mobile users before downlink beamforming and broad¬
ens beams and nulls within the beamforming calcula¬
tion. We first investigate the broadening beamforming
scheme to alleviate inaccuracies in DOA estimation.
Next we exam how to group the mobile users, with
the constraint of separation angle, to enhance down¬
link beamforming. Simulations show that the down¬
link beamforming complexity is decreased dramatically
with limited performance loss.
1. INTRODUCTION
Owing to the rapid growth demand in the mobile com¬
munication, the current capacity of mobile commu¬
nication faces a severe challenge during peak usage.
To remedy the capacity limitation, research on space-
division multiple-access (SDMA), which increases sys¬
tem capacity and decreases co-channel interference, has
been investigated.
A basic idea of SDMA is to spatially separate the
mobile users, which allows reuse of limited radio re¬
sources, such as frequency, time, or code slot within
a cell. SDMA relies on the application of an adap¬
tive array antenna at the base station to form mul¬
tiple beam patterns, which serve multiple user traffic
channels. Therefore the capacity of the system can be
increased.
Prior research shows that implementing SDMA on
the downlink increases the channel capacity [1], [2],
[3]. One simple SDMA approach uses the DOA esti¬
mated from uplink data and forms the spatial signature
for downlink transmission. However, in urban environ¬
ments, angular spreads (AS) could be up to 15° [4],
which means the estimated downlink beamforming pat¬
tern may degrade system performance due to narrow,
misaligned nulls. In addition, if the user DOAs are
not well separated, SDMA cannot provide much system
performance improvement. Furthermore, the downlink
beamforming algorithm needs extensive computation
power to solve a nonlinear optimization problem in¬
volving a nonlinear constraint weight vector for every
user [5]. This limits the applicability of this approach
for low complexity, real-time operation.
This paper proposes a new approach that clusters
(groups) mobile users before the downlink beamform¬
ing calculation. This approach alleviates the computa¬
tional complexity problem and the spatial separability
problem. The algorithmic block diagram is shown in
Figure 1. By carefully choosing AS and forming the
same beamforming weight vector wgroup to the same
group, the simulation results show that the clustering
scheme is within 3 dB of the conventional method, with
a dramatic decrease in computational complexity.
Cluster
Scheme
DOA
Estimation
Weight
Calculation
and
select
M
Figure 1: New Cluster Algorithm for Downlink Beam¬
forming
2. DATA MODEL
We assume that K users are served within the same cell
by the base station with a uniform linear array Antenna
0-7803-5988-7/00/$10.00 © 2000 IEEE
191
(ULA) consisting of M identical, omnidirectional sen¬
sors, equally spaced at distance d. A narrowband signal
model is assumed and the baseband signal received at
time t with Lk paths for the fcth user is:
K Lk
x(t) Akt a(°kiJu) sk(t - Tki) + n(t) (1)
k= 1 (=1
where n(t) is spatially and temporally white Gaus¬
sian noise and the array steering vector a(8, fu) is given
by
a(6,fu) = [l,e~j2wd^ sin9, ..., e-j2jrd^-(M-l) sin 0]T
(2)
where A*; is the amplitude of the Ith path of the
kth user, Sk(t) is the baseband signal transmitted at
the kth mobile and Tki is its corresponding delay.
Prom the received uplink signal, it is possible to
estimate the spatial covariance matrix, which contains
the directional information of the mobile radio channel
(dominant DO As Om ) and corresponding power for each
user. It can be written as following:
Lk
Rk = 5^ Ah a{8ki ,fd) aH (8ki , fd) (3)
(=i
Similarly we define the interference covariance ma¬
trix Qk as
Qk = ^2Ri + <T% I (4)
i^k
where a2N and I denote the white noise variance and
M x M identity matrix, respectively.
The goal of downlink beamforming is to design a
weight vectors Wkd{fd ; t) to transmit the constraint power
to the desired user and to minimize the transmitted en¬
ergy to the undesired user. In another word we want to
maximize the SINR (Signal to Noise plus Interference
Ratio) for the fcth user [6].
w
Wkd = arg max — ^
kdRkWkd
Wkd U>kdQkWkd
(5)
The solution of (5) is proportional to the generalized
eigenvector of matrix pair [Rk,Qk] [3]
Wkd
— Jma*l
Xk
- dk
JmaxJtf
Rk e
[max]
dx
; */ WkdRkWkd = xk
(6)
3. TARGET AND NULL BROADENING
The existence of angular spreads (AS) causes DOA es¬
timation error, which adversely affects the downlink
beamforming process. The SINR degrades because the
maximum transmitted power is not directed at the de¬
sired user, or because the nulls pointed towards to the
cochannel users are too narrow. One method presented
in this section will make the SINR more robust to DOA
estimation error. The angular spread based approach
[7], [8] can steer a broad range of beam patterns to¬
wards users of interest, or nulls toward the cochannel
users. A modified version of interference covariance
matrix can be written as:
Rk = Rk © >5max (7)
Qk =Qk® Sm ax (8)
with [5max]pg = e-2[^(p-?)]V-
where © and [.]P7 denote the Schur Hadamard element-
by element matrix product and the pq th element of a
matrix, respectively. The variable o-^ax quantifies the
angular spreads (AS) of the corresponding DOAs.
By using target and null broadening technique in
the downlink beamforming, the design of beamformers
are more robust in the mobile communication environ¬
ment. In addition, the beamforming weights are valid
for a longer time with less calculations required [8].
Figure 2 shows the beam pattern with and without
the broadening technique. It is clear that by applying
the broadening technique, the narrow nulling interfer¬
ence problem is solved. Although it introduces some
increase of the SINR perturbation, the worse case ef¬
fect of DOA estimation error is still negligible [6].
4. GROUPING AND DOWNLINK
BEAMFORMING ALGORITHM
Two conditions limit the performance and capacity of
SDMA systems:
1. Users that share same channel allocation are co¬
located, within the resolution of the beam pat¬
tern;
2. Co-channel, co-located users have disparate pow¬
ers, causing the so-called “near-far problem.”
A proposed solution to the near far problem is grouping
the mobile uses within power classes before downlink
beamforming [9].
Utilizing the advantage of the target and null broad¬
ening method, and the existence of angular spreading
192
Figure 2: Conventional Beamforming vs. Beamform¬
ing with Broadening Target and Null Technique with
Target at 90° and Null at 40°
(AS), we propose a grouping algorithm that is con¬
strained to angle separation with location in a cell. By
grouping all the users in a cell before downlink beam¬
forming and selective calculation for downlink beam¬
forming weight in a group, the computational complex¬
ity for the base station is decreased dramatically with
a tolerable performance loss.
The basic approach of grouping and downlink beam¬
forming calculation algorithm within a cell is the fol¬
lowing:
1. Determine the angle separation A 0 for each group,
typically use the angular spreading (AS) as a pa¬
rameter;
2. Assign users to same group if (A 9 < AS);
3. Determine the representative angular for each group,
typically choose the highest energy interference
source within a group as a representative;
4. Calculate the downlink beamforming weight wnew
for each group;
5. Apply the weight wnew for each user in the same
group.
We use a simulation with M=8 uniform linear an¬
tenna with half wavelength inter-element spacing to
verify that the performance loss is acceptable for the
above algorithm. Consider N=4 sources, one signal-of-
interest (SOI) and three signal-of-non-interest (SONI),
with initial SOI DOA of 90° and DOA’s of SONI at
40°, 120° and 140°. Figure 3 compares SINR error
for conventional beamforming and the target and null
broadening technique.
Figure 3: Downlink SNIR comparison for conven¬
tional beamforming method and beamforming using
the broadening technique.
From Figure 3, it is clear that if users are geomet¬
rically close enough, in this case AS < 8°, we can
reuse the same downlink weight wnew to save calcula¬
tions in base station with an acceptable trade-off 3dB
SINR loss, in this case. However, if we account for
interference source spreading angles, which are due to
the narrow nulls of traditional beamforming, the per¬
formance loss due to angle spreading towards the co¬
channel users is large. Figure 4 shows the performance
loss due to offset targeting the co-channel users for the
previous simulation scenario. It is obvious that the
broadening technique reduces performance loss due to
co-channel angle spreading.
We use a simulation to demonstrate the complex¬
ity savings of the grouping method. Figure 5 shows
the performance under different angle spreading, where
users are uniformly distributed by angle in a cell.
The results shown in Figure 3 and Figure 5 indi¬
cate that, with proper grouping user within a cell, it
is possible to save more than 50% of downlink beam¬
forming computational complexity with limited SINR
performance loss.
5. SIMULATIONS
The simulations model a system that uses a linear ar¬
ray antenna with M — 8 antennae and half wavelength
inter-element spacing and N = 25 mobile users uni¬
formly distributed from [0 ir) within a cell. Figure
193
Figure 4: Performance loss due to co-channel users an¬
gle offset.
Figure 5: Group number vs. user number under various
angle spreading conditions.
6 shows the block diagram for conventional downlink
beamforming and the flow chart for the grouping algo¬
rithm.
Based on Figure 6, Table 1 addresses, under the
simulation environment model, the computational load
for each block.
It is obvious that the proposed method needs only
one-third of typical base station complexity for calcula¬
tion R Q and Wdown- From the entire system viewpoint,
the new method reduces the computational complexity
needed in the base station for SDMA applications by
Conventional Downlink BF
Downlink BF With Broaden and Group Technique
Figure 6: Block Diagram for Conventional Dowlink BF
Algorithm and Algorithm with Broadening Technique
approximately 50%.
Figure 7 shows the performance of grouping plus
broadening target and nulls scheme, assuming that an¬
gle spreading exists on all sources (desired user and
cochannel interference). The worse scenario is target
and nulls not coincident with the estimated DOAs are
at maximum offset, AS = 8. Figure 7 shows that worse
case SINR loss decreases substantially by using group
and broadening scheme.
Combining the results of Figure 6 and Figure 7 in¬
dicates the efficacy of the new approach. By grouping
mobile user in a cell, and using the broadening target
and nulls technique, the downlink beamforming calcu¬
lation is reduced by approximately 50%, with accept¬
able performance loss.
6. CONCLUSION
In this paper, we have studied the grouping and broad¬
ening target and nulls technique for downlink beam¬
forming in mobile communication systems. Computer
simulations show that the benefit of grouping users not
only can alleviate the DO A estimation error problem,
but also can offer robust beamforming performance
in the present of source movement [8]. Moreover,
the computation complexity in the base station is de-
194
BF with
Conventional
Calculation
broadening
BF
Effort
R
8
25
(3)
Q
8
25
(4)
a{9)
8
25
(2)
w
8
25
(6)
X
25
25
X = s*w
Schur
Product
8*2
0
©
Decision
25
weight Select
0
Table 1: Computational Effort Comparison for Con-
ventioanl BF and BF with Group and Broadening
Technique
[3] Per Zetterberg and Bjorn Ottersten, ’’The Spec¬
trum Efficiency of a Base Station Antenna Array
System for Special Selective Transmission,” IEEE
Transactions On Vehicular Technology , vol. 44, no.
3, pp. 651-660, August. 1995.
[4] K. I. Pedersen, P. E. Mogensen, B. H. Fleury, ’’Spa¬
tial Channel Characteristics in outdoor environ¬
ments and their Impact on BS Antenna System Per¬
formance,” IEEE VTC, vol. 2, pp.719-723, August.
1998.
[5] Christof Farsakh and Josef A. Nossek, ’’Spatial
Covariance Based Downlink Beamforming in an
SDMA Mobile Radio System,” IEEE Transactions
on Communications, vol. 46, no. 11, pp. 1497-1506,
November. 1998.
[6] Klaus Hugl, Juha Laurila and Ernst Bonek, ’’Down¬
link Performance of Adaptive Antennas With Null
Broadening,” IEEE VTC, vol. 1, pp. 872-876,
September. 1999.
[7] Klaus Hugl, Juha Laurila and Ernst Bonek, ’Down¬
link Performance for Frequency Division Duplex
Systems,” IEEE Globecom, vol. 4, pp.2097-2101,
December. 1999.
[8] Jaume Riba, Jason Goldberg and Gregori Vazquez,
” Robust Beamforming for Interference Rejection in
Mobile Communications,” IEEE Transactions on
Signal Processing, vol. 45, no. 1, pp. 271-275, Jan¬
uary. 1997.
[9] Michael Tangemann, ’’Near Far Effects in Adaptive
SDMA Systems,” IEEE PIMRC, vol. 3, pp.1293-
1297, September. 1995.
Figure 7: Simulation Result Under N=25; Group with
8°; Target and Interference Both Offset Criteria
creased dramatically, without significant performance
loss for SDMA systems.
REFERENCES
[1] Christof Farsakh and Josef A. Nossek, ’’Applica¬
tion of Space Division Multiple Access to Mobile
Radio,” IEEE PIMRC, vol. 2, pp. 736-739, Septem¬
ber. 1994.
[2] Christof Farsakh and Josef A. Nossek, ”On The
Mobile Radio Capacity Increase Through SDMA,”
IEEE International Zurich Seminar on Broadband,
Comm., pp. 293-297, February. 1998.
195
ON THE USE OF CYCLOSTATIONARY FILTERS TO TRANSMIT INFORMATION
**
Alban DUVERDIER* , Bernard LAC AZE** andJean-Yves TOURNERET*
* CNES, 18 av. Belin, BPI 2012, 31401 Toulouse Cedex 4, France
ENSEEIHT/SIC, 2, rue Camichel BP7122, 31071 Toulouse Cedex 7, France
tel: +33 (0)5 61 28 31 79 / fax: +33 (0)5 61 28 26 13
email: Alban.Duverdier@cnes.fr
ABSTRACT
Linear periodic time-varying filters are often introduced to¬
day in telecommunication. They spread the spectrum and
can be used for scrambling, multi-user access or channel
modeling. Recently, the authors have defined linear cy¬
clostationary filters. In particular, this generalization has
permitted to take into account the random parameters of a
transmission channel. This paper defines a new case of lin¬
ear cyclostationary filter where information is included into
the filter.
We first recall the definition of linear periodic and linear
cyclostationary filters. The paper presents then particular
cases of these filters based on clock change. Thus, we in¬
troduce modulated periodic clock change. This filter can
be used to transmit simultaneously an analog and a digital
signal. We present the reconstruction method of the initial
signals. We obtain reconstruction results in the case of the
simultaneous transmission of an analog and a binary infor¬
mation.
1. INTRODUCTION
In telecommunications, signals subjected to a linear period¬
ic filter [ 1] [2] are often encountered. Thus, this filter spread
the spectrum and can correspond to a scrambling system
[3], a multi-user access method [4] or a transmission chan¬
nel modeling [5]. Recently, it was shown that they can be
generalized in linear cyclostationary filters [6].
In the first section, we recall some definitions. In par¬
ticular, we present the definition of linear cyclostationary
filter. We introduce then a new filter called modulated peri¬
odic clock change. It permits to transmit simultaneously an
analog and a digital signal. We present the reconstruction of
the input signals. Finally, we apply the obtained reconstruc¬
tion results to the transmission of an analog and a binary
information.
2. DEFINITIONS
2.1. Stationary and cyclostationary processes
Let A — {4(f), t G R) be an harmonisable zero mean and
mean square continuous process. A admits a Cramer-Loeve
representation 0.4(0;) [7] such that:
+oo
A(t)= I eiutd@A(u>) (1)
— OO
We note mA (t) and RA (t, r) the mean and autocorrelation
function of A given by:
mA(t) =f?[4(f)] (2)
RA(t,T) = E[A(t + T/2)A*(t-r/2)] (3)
The power spectrum of A, SA <(w), is defined by:
+oo
RA(t,T)= J eiurdSAt(uj) (4)
— OO
A is said to be stationary if and only if mA(t) and RA(t, r)
are independent of t. dSA t(u>) is then independent of t.
A is said to be cyclostationary if and only if mA(t)
and RA(t,r) are periodic in t of period T = 27t/u;o [8].
dSA t (w) is then periodic in t. We suppose that it admits the
Fourier series decomposition such that:
+ CO
dSAt{v) = £ eau)otdSlA(u) (5)
/= — oo
2.2. Linear time-invariant and periodic time-varying fil¬
ters
Let h be a linear time-varying filter of frequency response
ht (w). Its response to the stationary process Z is the process
X defined by:
+oo
X(t)= J eiutht(uj)dez(uj) (6)
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
196
h is a linear time-invariant filter if and only if ht{uj) is inde¬
pendent of the time.
h is a linear periodic time-varying filter if and only if
) is periodic in time of period T [1]. We suppose that
it admits the Fourier series decomposition such that:
-f OO
Mw) = £ eilwoth!(u) (7)
/=— OO
2.3. Linear stationary and cyclostationary filters
The linear random time-varying filter is a generalization of
the linear time-varying filter previously defined [9]. Let
{i7w}w6 r be a complex random processes family, where,
for any u>, Hu = {Ht(cj), t € R} is a complex continuous
random process. We note Xt(w) the mean and <£>*,,- (w, 7)
the intercorrelation function of the {Hu}ue r. Xt(^>) and
¥>t,r(w, 7) are given by:
XM = E[Ht(u)] (8)
<PtAu, 7) = E[Ht+r (« + 1 )H*_ f(u; - |)] (9)
Let h be a linear random filter of frequency response Ht(uj).
Its response to the stationary process Z is the process X
defined by:
+ 00
X(t)= I eiwtHt(ij)dQz(u >) (10)
— OO
Thus, each linear filter can be seen as a particular case of lin¬
ear random filter, where H w is a degenerated random vari¬
able.
A linear random filter h is said to be stationary if and
only if the processes are jointly stationary. It
means that the mean and the intercorrelation function of the
{H“} wSR. are independent of the time.
Recently, the authors have generalized this definition
[6]. We call h a linear cyclostationary filter if and only if
the processes {F“}u€ r are jointly cyclostationary. It cor¬
responds to the case where the mean and the intercorrelation
function of the {#w}u;eR are periodic in time of period T.
3. CLOCK CHANGE
3.1. Periodic clock change
The response X of a stationary process Z subjected to a
periodic clock change [3] h is defined by:
X(t) = g(t)Z[t-f(t)] (11)
where /(£) and g(t) are real measurable functions, T =
27r/u>o periodic. In equation (1 1), f(t) is a timing jitter and
g(t) corresponds to an amplitude modulation. It is easy to
see that a periodic clock change is a particular case of linear
periodic filter and that its frequency response is given by:
Mw) - 9{t)e~iuS(t) (12)
Periodic clock changes can be implemented easily. They
appear also often in spread spectrum applications that use
linear periodic filters, such as scrambling [3] and multi-user
access [4].
3.2. Reconstruction of the input signal
Figure 1 depicts the reconstruction chain of a signal submit¬
ted to a periodic clock change.
Z(t) _
information
■^Periodic Clock Change
X(t)=g(t)Z(t-f(t)
observation
Reconstruction
Figure 1 : Reconstruction chain of a signal submitted to a
periodic clock change
The reconstruction of a process subjected to a periodic
clock change is a particular case of reconstruction of a pro¬
cess subjected to a linear periodic filter. Equations (6) and
(7) show that the response X of the stationary process Z
subjected to a linear periodic filter h admits the following
spectral representation:
+oo
dQx(v)= tl>k{w - ku>0)dQz{u ~ ku>0) (13)
k=— oo
When the spectral support of Z is included in [ — oz0/2, +’o/2[,
Z can then be reconstructed by:
Vw€[— wo/2,wo/2[, VfcgA, d©z(a>)=i/i~1(w)rf©x(w+fcwo)
where A is the integer set such that the functions {rpk (+0 } fceA
are different from zero on the spectral support of Z. Multi¬
ple redundant reconstructions of Z can also be obtained by
a frequency downconversion followed by a lowpass filtering
on [— <+>0/2, uz0/2[.
3.3. Modulated periodic clock change
The paper proposes a new clock change scheme that permits
to transmit simultaneously an analog and a digital informa¬
tion. This spread spectrum technique is a generalization of
197
the classic periodic clock change. It can be useful for exam¬
ple to scramble video with analog image and digital sound.
It is called modulated periodic clock change.
The response X of a stationary process Z subjected to
such a clock change h is defined by:
X{t)=g(t)Z[t-M(t)f(t)} (15)
where /(f) and g{t) are defined as in (11) and M = {M(f), t €
R} is a stationary process independent of Z. Figure 2 de¬
picts the obtained transmission chain.
Z(t) .
analog
information
M(t)
digital
information
Figure 2: Transmission chain of a signal submitted to a
modulated periodic clock change
It is easy to see that Z is then subjected to a cyclosta¬
tionary filter of frequency response given by:
Ht(u) = g(t)e-iuMWfW (16)
In general, the reconstruction of Z(t) can be obtained by a
sub-optimal solution [6]. Nevertheless, perfect reconstruc¬
tion is possible when M is a Bernoulli variable that is equal
to -1 or +1.
In this case, equation (14) becomes:
Vug[-w0/2,wo/2[, Vfce A, d0z(w)=^1(Mw)(i0x(u+tuo) (^)
Let k\ and k2 be two values of k. Equation (17) implies
that:
VwG[— wo/2,wo/2[, ip^*(Mu)d&x (u+kiu>o)=i’^1(,Mui)d@x(,t^+k2wo)
(18)
This equality allows the identification of M whenever tpki (a>)
and ipk2(u}) are not simultaneously even functions. Know¬
ing M, Z(t) can be perfectly reconstructed using (17).
This method can then be used for any binary signal M(f)
whose sampling rate is much larger than T. It could be also
generalized to any digital signal M(f).
4. APPLICATION
4,1. Simultaneous transmission of an analog and a bi¬
nary information
In the following simulations, a modulated periodic clock
change is used to transmit simultaneously an analog signal
Z(t) band-limited on [— w0/2, oj0 / 2 [ and an N.R.Z. signal
M(t). f(t) and g{t) are given by:
/(t) = -ctsinwot and g(t) = 1 (19)
Figure 3 depicts the analog signal at input of the clock change.
Figure 3: Initial analog signal
The binary signal is presented by Figure 4.
0.9. -
0.5.
0.1_
-0.3.
-0.7.
-1.1-4 , , , _ _ _ _ _ _
0 1 2 3 4 S
Figure 4: Initial binary signal
The signal observed at the output of the clock change is
represented in Figure 5 for a = 0.104, T = 0.0347 ms and
a bit rate of 1 kb/s.
Figure 5: Observed signal
4.2. Reconstruction of the analog information
We have seen that Z(t) has to be reconstructed while M (t)
is constant. As M (t) is a binary signal, the reconstruction
Clock change
with modulated
periodic function
* X(t)=g(t)Z(t-M(t)f(t))
198
functions of Z(t) are given during each bit length by:
ipk{Muj) = Jk (Maw) (20)
where Jk (w) is the k’th order Bessel function and M is the
value of M(t) that is equal to +1 or —1. The reconstruction
of Z ( t ) does not depend of M when k is even. It can then
be obtained directly around any even k. Figure 6 compares
the initial signal to the reconstruction obtained for k = 0.
The analog information is well reconstructed.
4.3. Reconstruction of the binary information
As we know a correct reconstruction of Z(t) for k even, the
reconstructions obtained for k odd will allow to know when
M(t) is correctly identified. Figure 7 and 8 compare the
initial signal to the reconstruction for k = 1, when M(t) is
supposed always equal to +1 and when M(t) is correctly
identified.
Reconstruction
of Z(t) for k even
Observed signal
X(t)=g(t)Z(t-M(t)f(t)) 1
Reconstruction
of Z(t) for k odd
with M=1
(Reconstruction
of Z(t) for k odd
with M=-l
* Estimation of Z(t)
Decision
over a bit period
" Estimation of M(t)
Figure 9: Scheme for the estimation of Z(t) and M(t)
proposed a reconstruction method of the signals transmitted
by this filter. It was applied successfully to the simultaneous
transmission of an analog and a binary signal.
The block diagram of Figure 9 shows a scheme witch
allows to reconstruct Z(t) and to recover the values of M(t)
assuming perfect timing of the corresponding bit stream.
5. CONCLUSION
In this paper, we recalled the definition of a linear periodic
filter and of a linear cyclostationary filter. We presented
a new filter called modulated periodic clock change. We
6. REFERENCES
fl] L.E. Franks, ’’Polyperiodic Linear Filtering” in Cy do¬
st at ionar it y in Communications and Signal Processing,
William A. Gardner (eds.), IEEE Press, 1993
[2] D. MacLemon, ’’Inter-relationships between different
structures for periodic systems”, EUSIPCO, 1998
[3] A. Duverdier and B. Lacaze, ’’Time-varying reconstruc¬
tion of stationary processes subjected to analogue peri¬
odic scrambling”, ICASSP, 1997
[4] A. Duverdier and B. Lacaze, ’’Transmission of two user-
s by means of periodic clock changes” , ICASSP, 1998
[5] R.G. Gallager, Information Theory and Reliable Com¬
munication, Wiley, 1968
[6] A. Duverdier, B. Lacaze and D. Roviras, ’’Introduction
of linear cyclostationary filters to model time-variant
channels” , GLOB EC OM, 1999
[7] H. Cramer and M.R. Leadbetter, Stationary and Related
Stochastic Processes, Wiley, 1967
199
[8] W.A. Gardner and L.E. Franks, ’’Characterization of
cyclostationary random signal processes”, IEEE Tran-
s. Inform. Theory, 4-14, 1975
[9] P.A. Bello, ’’Characterization of randomly time vari¬
ant linear channels”, IEEE Trans. Comm., pp. 360-393,
1963
NON-PARAMETRIC TRELLIS EQUALIZATION
IN THE PRESENCE OF NON-GAUSSIAN INTERFERENCE
Carlo Luschi*, Bernard Mulgrew
* Bell Laboratories, Lucent Technologies
Unit 1, Pagoda Park, Westmead Drive, Swindon SN5 7YT, United Kingdom
Dept of Electronics and Electrical Engineering, University of Edinburgh
The King’s Buildings, Mayfield Road, Edinburgh EH9 3JL, United Kingdom
ABSTRACT
We consider the problem of equalization of the frequency
selective mobile radio channel in the presence of co-channel
interference (CCI). Conventional trellis equalizers treat the
sum of noise and interference as additive white Gaussian
noise, while CCI is generally a colored non-Gaussian process.
We propose a non-parametric approach based on the esti¬
mation of the probability density function of the noise-plus-
interference. Given the availability of a limited volume of
data, the density is estimated by kernel smoothing tech¬
niques. Due to the temporal color of the CCI, the use of
a whitening filter is also addressed. Simulation results are
given for the GSM system, showing a significant perfor¬
mance improvement with respect to the equalizer based on
the Gaussian assumption.
1. INTRODUCTION
Time-division multiple access (TDMA) mobile radio sys¬
tems like GSM are affected by co-channel interference (CCI)
and intersymbol interference (ISI) due to multipath propa¬
gation. Channel equalizers commonly employed in practi¬
cal GSM receivers perform maximum likelihood (ML) [1] or
maximum a posteriori probability (MAP) [3] data estima¬
tion on the ISI trellis. ML sequence estimation using the
Viterbi algorithm [2] is well known as the optimum detec¬
tion technique for signals corrupted by finite-length ISI and
additive white Gaussian noise (AWGN), in the sense that it
minimizes the probability of a sequence error. The symbol-
by-symbol MAP algorithm, proposed over two decades ago
by Bahl et al. [3] for decoding of convolutional codes, has
recently received renewed interest as a soft-in/soft out de¬
coder for iterative decoding of parallel or serially concate¬
nated codes [4]. As a trellis equalizer, the MAP algorithm
is optimum in the sense that it minimizes the probability of
symbol error. In receivers employing the concatenation of
an equalizer and a channel decoder, the performance is im¬
proved by soft-decision decoding and iterative equalization
and decoding [5]. In this respect, the MAP algorithm has
the advantage of intrinsically providing optimal a posteriori
probability as a soft-output value.
In this paper, we consider the problem of equalization
of the mobile radio channel in the case of single channel
reception. The optimum trellis equalizer in the presence
of ISI, CCI, and AWGN is based on joint detection of the
co-channel signals [7]. Although joint ML and joint MAP
detection are optimal, they can be prohibitively expensive
since the complexity increases exponentially with the sum
of the channel lengths of the desired and CCI signals. In ad¬
dition, the estimation of the channel impulse response of all
co-channel signals requires the knowledge of the training se¬
quence of each interferer. On the other hand, conventional
receivers employ a trellis equalizer which treats the sum of
noise and interference as additive, white, Gaussian noise.
In reality, the sum of noise and CCI is generally a colored
non-Gaussian random process, and the above approach cor¬
responds to a degradation of the error performance.
In order to correctly set the problem of trellis data es¬
timation, a proper statistical characterization of the dis¬
turbance is required. To this purpose, we propose a non-
parametric trellis equalizer, based on the estimation of the
probability density function of the noise-plus-interference.
Given the limited volume of training data, the work is based
on the application of density estimation by kernel smooth¬
ing. The temporal color of the CCI is taken into account
by a whitening filter.
2. MAP TRELLIS EQUALIZATION
2.1. System Model
Consider the received signal
L — l
rk = ^2 bk~ehek) + > (1)
e=o
where bk € {+1,-1} are the transmitted symbols, the L
complex tap-gains represent the samples of the equiva¬
lent channel impulse response at time k, and = y'k + wk
indicates the sum of co-channel interference and thermal
noise. In the case of the GSM system, we consider the lin¬
earized model of the GMSK signal [8], where are the
taps of the equivalent discrete-time channel produced by
derotation of the received signal [9]. The GSM signal has al¬
most zero excess bandwidth, and we assume that sufficient
statistics for data estimation can be obtained by symbol-
rate sampling at the output of a fixed front-end filter. The
0-7803-5988-7/00/$10.00 © 2000 IEEE
201
analysis can be extended to include the case of non-zero ex¬
cess bandwidth by introducing oversampling and fraction¬
ally spaced trellis equalization.
In this Section, we consider the CCI samples as in¬
dependent complex non-Gaussian random variables. The
discrete-time process y'k is generally colored, even if the
delay spread in a typical interference-limited environment
is usually relatively small. At high signal-to-noise-ratios
(SNRs) a suitable temporal prewhitening is assumed to
produce approximately independent non-Gaussian distur¬
bance. The validity of this assumption will be discussed in
Section 3.
2.2. Symbol-by-Symbol MAP Algorithm for Finite-
Length ISI and Additive Independent Disturbance
Suppose that the symbols bk are transmitted in finite blocks
of length N. Assuming the knowledge of the channel im¬
pulse response, a soft-output symbol-by-symbol MAP equal¬
izer computes the a posteriori log-likelihood ratio
L(bk\r0, . . .,rN-i) = log
Pr(6fc = +l|rp, . . . ,rjy_i)
Pr(6fc = — l|r0,...,nv-i)
(2)
with 0 < k < N — 1. Let pk = (bk-i, . . ,,bk-L+i) denote
the generic ISI state at time k , and S(bk) the set of states
corresponding to the transmitted symbol bk . Indicating by
fk the transition from the state pk to pk+i, the MAP al¬
gorithm results in a forward and backward recursions with
the transition metric A(£fc), coupled by a dual-maxima op¬
eration [3], [6]
L(6fc|r0,...,rjv_i) = max' A(pk+i)- max' A(uk+i)
HES(bif~+l) ^GS(6fc = — 1)
, (3)
A(Atfc+i) — A (fj,k) — A(£fc) + Ah(/j,k+i) , (4)
where A(pk) is the overall accumulated metric for the state
pk, A * and Ab are the accumulated metrics in the forward
and backward recursions, and ma x'{x,y} = maxfi, y} +
log(l +e-'*-yl) [6]. The metric increment \(fk) results
■*(&) = -logp(rk\bk,...,bk-L+i) -logPr(&fc) , (5)
where p(rk\bk, ..., bk-L+1) = pn(rk - bk-th f )). In
the case where nk is modelled as AWGN, the quantity
— logp(rfc|6fc, . . . , bk-L+i) in (5) produces the Euclidean dis¬
tance metric. When no a priori information is available
about the transmitted bit bk, the term — log Pr(bk) in (5)
has no effect and can be omitted from the calculation. On
the contrary, if the equalizer receives some a priori infor¬
mation the above term has a fundamental role in deriving
a soft-in/soft-out MAP equalizer [4], [5].
Observe that the above derivation relies on the assump¬
tion of known channel. In practice, the channel response is
usually estimated using a known training sequence at the
equalizer start-up.
3. TRELLIS EQUALIZATION BY
NON-PARAMETRIC DENSITY ESTIMATION
3.1. Density Estimation by Kernel Smoothing
An example of the density function of the noise plus CCI
samples nk for the case of the GSM channel is shown in
GMSK signal, GSM TU profile
mm iokm w;a-«rri gni ptssnr. , r» *t>m, f* * f, m
Figure 1: Example of the density function of CCI (derotated
GMSK signal) plus AWGN for a GSM receiver.
Figure 1. The plot has been obtained by a histogram of the
data in 2000 bursts, considering one dominant interferer
under stationary propagation conditions. From Figure 1,
it is apparent that the disturbance can not be realistically
modelled as a Gaussian random variable.
3.1.1. Parzen Estimator
An estimate of the probability density function of a com¬
plex random variable X can be built from a set of data
Xi, i = 1 , ... ,n, by means of a smoothing function or ker¬
nel function K(x,Xi) (see [11] and references therein). In
the method proposed by Parzen [10], an estimate of the
unknown density is given by
Pn(x) = --TK(x1Xi) . (6)
n <= 1
A possible choice for the function K(x,Xi) among those
satisfying the conditions for (asymptotic) unbiasedness and
consistency of the estimator [10] is the Gaussian kernel of
fixed width <to
K(x,Xi) = _L.c-|"-*lW . (7)
3.1.2. Transition Metrics for N on-Parametric Trellis
Equalization
In the case of a Bayesian trellis equalizer, the random vari¬
able X represents one realization of the process of noise-
plus-interference corresponding to a given received burst.
Consider the received signal (1), and assume that the chan¬
nel is approximately constant within the burst duration.
Then, once the channel taps he are estimated using the
M training symbols bi} they can be used to derive the set
of observations Xi, i = 1, . . . ,n = M - L of the random
disturbance X according to Xi = hi — r, — bi-ehe,
202
Figure 2: Block diagram of the non-parametric trellis equal¬
izer.
where hat denotes the estimated value. At this point we re¬
call that the transition metric (5) of the optimum symbol-
by-symbol MAP algorithm results A(£*,) = —log pn(rfc —
bk-eht) - logPr(6fc). Therefore, using (6) and (7)
one can directly estimate the quantity log pn('x) for x =
nk =rk- o bk-ehe, and obtain
A(£fc) = -logpn(x) - log Pr(6fc) . (8)
and variance 2cr2, which we assume independent of y'k. If
the co-channel taps h'/k> at time k are regarded as an un¬
known, but deterministic mapping from (b'k, . . . , b'k_L,+1)
to y'k, the distribution of nk can be derived from those of
b'k and wk . Given a generic binary quantity /3, we define
L'-l (
Tji = T/i,i +ji]i, 2 — y; , o < i < 2l , (10)
£=0
where A = {Pi.ejeJo1 denotes one of the 2L distinct se¬
quences of elements A,r G {+1, —1}. Then, it is possible to
show that the expression of the density of nk results
1
Pn(x) =
(11)
1=1
where pw(x) is the complex Gaussian density with vari¬
ance 2er2. From (11), the density of the interference-plus-
noise is given by a number of symmetric Gaussian kernels,
which centers are the points of the hypothetical scatter dia¬
gram obtained in the absence of thermal noise. Comparison
of (11) and (6) reveals the strong connection between the
structure of the Parzen estimator and the true density. In
particular, for a2 — > 0, the observations X; in (6) corre¬
spond to the points of the complex plane defined by (10),
with the binary parameters A,/ replaced by the co-channel
symbols b'k_e. Therefore, the estimator defined by (6) and
(7) will approach the true density (11) as soon as the di¬
mension of the training data is large enough to represent
the 2l equiprobable sequences A = (A.tlfco1-
The block diagram of the resulting equalizer is shown in
Figure 2. From the implementation point of view, the den¬
sity logPn(*) at time k can be computed separately for each
trellis branch. Alternatively, it can be precomputed for a
finite number of values x, and stored in a look-up-table be¬
fore starting the trellis processing.
We emphasize the fact that the above technique deals
with the statistical model of a random variable, obtained
as the realization of the noise-plus-interference process at a
given time instant. It is worth noting that, with a proper
adaptive procedure, the approach can be extended to those
cases where the CCI impulse response cannot be considered
approximately constant within the burst.
3.2. Probability Density Function of the Noise-plus-
interference
The analytical expression of the actual density function of
noise-plus-interference can be carried out if we assume a
(unknown) deterministic finite-state machine model for the
co-channel signal. Consider the received signal (1). The
sum of noise and CCI at time k can be expressed as
L'- 1
nk = y'k + Wfc = ^2 b'k-ttil{k) -(- wk , (9)
«=o
where b'k G {+1,-1} are the co-channel symbols, h'/k\
0 < £ < I/ — 1 denote the taps of the co-channel impulse
response, and wk is white Gaussian noise with zero mean
3.3. Doubling the Size of the Training Set
We observe that in (10) for each index i = i' corresponding
to the binary sequence A' = {A'.iJJLo1 there is an index
i = i" with A" = {— A'.f}^1 = -A'- This means that
for each i' there is an i" such that r;p = —T\i» . Exchanging
each pair of indexes i' and i" in the sum (11) and taking
into account the symmetry of the Gaussian density pw(x)
gives p„(-x) = (1/2L' pw(-x+T]i) =pn(x). The im¬
portance of this result comes from the fact that it allows to
double the available volume of data in the density estimator
(6). In fact it implies that, if { Xi } are values assumed by
the random variable nk, then the set {—X,} contains val¬
ues assumed by nk with the same probability. Therefore,
together with each outcome Xi we can additionally consider
— Xi as if it was the result of a parallel experiment. This
leads to the enlarged data set {Xi, —Xi}.
3.4. Choice of the Smoothing Parameter
An optimal kernel width for the fixed-width density estima¬
tor (6) can be determined through the minimization of the
mean integrated square error (MISE) [11]. In the case of the
Gaussian kernel (7) used to estimate the complex Gaussian
density with variance 2<r2, we have cr0(opt) = (l/n)1/6cr [11].
For the density of the noise-plus-interference, using (11) and
applying Cauchy’s inequality we find
O-O(opt) > (l/n)1/6cr . (12)
203
Figure 3: Error performance in the case of known channel.
GSM TUO profile, SNR = 30 dB. Density estimator with
fixed kernel width cro = 0.05.
With a given volume n of training data, the kernel width
can then be selected from the value of the noise variance
a2 . In a practical receiver, an estimate of <r2 can be derived
by the training sequence, taking into account the estimated
channel response and the measure of the received signal
level.
3.5. Temporal Whitening
The MAP equalizer with branch metric (8) is based on the
assumption that the samples rik are independent. Given the
temporal color of the CCI, a whitening filter of the distur¬
bance is needed before the trellis processor. We point out
that a linear prediction-error (LPE) filter will ideally pro¬
duce uncorrelated CCI-plus-noise samples, but this does not
necessarily imply independence, since the process continues
in general to be non-Gaussian. In addition, a whitening fil¬
ter for the disturbance will inevitably increase the channel
memory for the desired signal. And if we do not want to
increase the number of states of the equalizer, the number
of taps of the filter has to be kept small. However, the
delay spread of the typical GSM urban channel is usually
lower than 4 symbol intervals. Moreover, reducing the cor¬
relation between the samples will certainly reduce their ’de¬
pendence’. Note that in some particular cases the whitened
disturbance turns out to be actually independent. As an ex¬
ample, this happens when the variance of the thermal noise
tends to zero and the co-channel is minimum-phase (in fact,
in this case the ideal LPE filter inverts the co-channel).
4. SIMULATION RESULTS
The effectiveness of the strategy based on density estima¬
tion by kernel smoothing has been assessed by computer
simulation for the case of a GSM receiver with single chan¬
nel reception. The GMSK transmitted symbols are ob¬
Figure 4: Error performance in the case of estimated chan¬
nel. GSM TUO profile, SNR = 30 dB. Density estimator
with fixed kernel width a o = 0.05.
tained from the source bits by rate 1/2 convolutional en¬
coding and interleaving, according to the GSM specifica¬
tions for the full-rate speech traffic channel. The simula¬
tor includes the multipath fading channel with the classical
Doppler spectrum [14], CCI, and thermal noise. Ideal fre¬
quency hopping is implemented. One dominant co-channel
interferer is assumed, characterized by an independent fad¬
ing process and a random phase shift with respect to the
signal of interest. In all the simulations SNR = 30 dB. At
the receiver, the soft-output data produced by a 16-states
MAP equalizer are deinterleaved and decoded by a convo¬
lutional channel decoder.
To establish the ultimate performance of the proposed
equalizer, we first consider the ideal case of known chan¬
nel and relative speed 0 Km/h. Figure 3 shows the bit¬
error rate (BER) performance with GSM typical urban area
(TU) multipath profile for both co-channel signals. The
MAP non-parametric equalizer is compared with the MAP
trellis processor that assumes Gaussian disturbance. The
figure also addresses the effect of doubling the data set for
density estimation, as discussed in Section 3. The results
indicate that the non-parametric equalizer offers a poten¬
tial improvement of more than two orders of magnitude in
terms of BER at the equalizer output. Figures 4 to 6 il¬
lustrate the receiver performance when the channel of the
signal of interest is estimated from the training symbols.
We also introduce an LPE filter for prewhitening of the col¬
ored disturbance. As discussed in Section 3, choosing the
prediction order involves a trade-off between performance
and complexity. In the figures, we use a 16-states trellis and
a 2-taps LPE filter. Finally, we include the performance ob¬
tained by iterative channel estimation. In this case, after
the equalization of the entire burst, the data decisions are
fed back to produce an improved channel estimate, which
is used in a second pass equalization.
The above simulation results refer to a synchronous
204
Figure 5: Error performance with iterative channel estima¬
tion. GSM TUO profile, SNR = 30 dB. Density estimator
with fixed kernel width <7o = 0.05.
interference scenario. Simulation with asynchronous CCI
shows that the proposed equalizer still outperforms the con¬
ventional trellis processor. However, in those cases the
proper approach consists in introducing an adaptation of
the estimated density of the noise-plus-CCI.
5. CONCLUSIONS
A non-parametric trellis processor has been studied for chan¬
nel equalization in the presence of non-Gaussian interfer¬
ence. In the case of the GSM system, the proposed ap¬
proach based on density estimation by kernel smoothing
provides a significant performance improvement with re¬
spect to the receiver that assumes Gaussian disturbance.
REFERENCES
[1] G. D. Forney, Jr., “Maximum likelihood sequence es¬
timation of digital sequences in the presence of in¬
tersymbol interference,” IEEE Trans. Inform. Theory,
vol. IT-18, no. 3, pp. 363-378, May 1972.
[2] G. D. Forney, Jr., “The Viterbi algorithm,” Proc.
IEEE, vol. 61, no. 3, pp. 268-278, Mar. 1973.
[3] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Op¬
timal decoding of linear codes for minimizing symbol
error rate,” IEEE Trans. Inform. Theory, vol. IT-20,
pp. 284-287, Mar. 1974.
[4] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pol-
lara, “A soft-input soft-output APP module for itera¬
tive decoding of concatenated codes,” IEEE Commun.
Letters, vol. 1, no. 1, pp. 22-24, Jan. 1997.
[5] G. Bauch, H. Khorram, and J. Hagenauer, “Iterative
equalization and decoding in mobile communications
Figure 6: Error performance with iterative channel estima¬
tion. GSM TU50 profile, SNR = 30 dB. Density estimator
with fixed kernel width er0 = 0.05.
systems,” in Proc. Eur. Pers. Mobile Commun. Conf.,
(Bonn, Germany), pp. 307-312, Oct. 1997.
[6] A. J. Viterbi, “An intuitive justification and a simpli¬
fied implementation of the MAP decoder for convolu¬
tional codes,” IEEE J. Select. Areas Commun., vol. 16,
no. 2, pp. 260-264, Feb. 1998.
[7] K. Giridhar, J. J. Shynk, A. Mathur, S. Chari, and
R. P. Gooch, “Nonlinear techniques for the joint esti¬
mation of cochannel signals,” IEEE Trans. Commun.,
vol. 45, no. 4, pp. 473-484, Apr. 1997.
[8] P. Laurent, “Exact and approximate construction of
digital phase modulations by superposition of ampli¬
tude modulated pulses (AMP),” IEEE Trans. Com¬
mun., vol. 34, no. 2, pp. 150-160, Feb. 1986.
[9] A. Baier, “Derotation techniques in receivers for MSK-
type CPM signals,” in Proc. Eusipco, (Barcelona,
Spain), Sept. 1990.
[10] E. Parzen, “On estimation of a probability density
function and mode,” Ann. Math. Statist., vol. 33,
pp. 1065-1076, 1962.
[11] A. W. Bowman and A. Azzalini, Applied Smoothing
Techniques for Data Analysis. Oxford: Oxford Uni¬
versity Press, 1997.
[12] J.-N. Hwang, S.-R. Lay, and A. Lippman, “Nonpara-
metric multivariate density estimation: A compara¬
tive study,” IEEE Trans. Signal Proc., vol. 42, no. 10,
pp. 2795-2810, Oct. 1994.
[13] C. Diamantini and A. Spalvieri, “Quantizing for min¬
imum average misclassification risk,” IEEE Trans.
Neural Networks, vol. 9, no. 1, pp. 174-182, Jan. 1998.
[14] J. G. Proakis, Digital Communications. New York: Me
Graw-Hill, 3rd ed., 1995.
205
ANALYTICAL BLIND IDENTIFICATION OF A SISO COMMUNICATION
CHANNEL
Olivier GRELLIER and Pierre COMON
Lab. I3S, Algorithmes-Euclide-B, 2000 route des Lucioles
BP 121, F-06903 Sophia-Antipolis cedex, France
grellierCi3s . unice .f r comonCunice . f r
ABSTRACT
In this paper, a novel analytical blind identification al¬
gorithm is presented, based on the non-circular second-
order statistics of the output. It is shown that the
channel taps need to satisfy a polynomial system of
degree 2, and that identification amounts to solving
the system. We describe the algorithm able to solve
this particular system entirely analytically. Computer
results demonstrate its efficiency.
1. INTRODUCTION
Blind identification methods depend on the characteris¬
tics of the input sources. For example, it is known that
a system can only be identified up to an all-pass filter
when its input is Gaussian circular. Consequently, a
particular attention has been paid to the non-Gaussian
input cases. In those situations the phase information
can be accessed using high-order statistics of the obser¬
vations, and in the SISO case the system is identified
up to a scalar factor only. This has been studied in
numerous papers among which one can cite the works
of Shalvi- Weinstein [5], Tugnait [7],
An interesting class of non-Gaussian signals is the
discrete one, which appears in wireless communica¬
tions. The discrete character has been used by few
authors such as Li [3] or Yellin and Porat [8] , who were
the first interested in an algebraic solution. The stud¬
ied signals have also non zero cyclostationary statistics,
which allows identification using second-order statistics
only [4] [6].
The novelty of our contribution is two-fold. First,
non-circular second-order moments are used. Second,
an algebraic solution to a class of polynomial systems,
constructed from a block of data, is introduced. Our
approach is described in the case of MSK modulations,
approximating well the digital modulation utilized in
the GSM standard. In addition, block methods are
well matched to burst-mode communication systems.
2. MODEL, NOTATION, AND
ASSSUMPTIONS
Assume a finite sequence of input samples x(m) is fed
into a Finite Impulse Response (FIR) linear system of
length M. Denote y(n) the corresponding output se¬
quence of length N, satisfying:
M — 1
y(n ) = ^2 h(m) x(n — m) + w(n) =f x(n; M)Th+ie(n)
m= 0
Multidimensional variables are stored in column vec¬
tors and denoted by boldface letters; for instance,
x(n; M) = [x(n), . . ,x(n — M + 1)]T, by construction.
The input sequence is assumed to follow a discrete
distribution, stemming from BPSK, MSK, or QPSK
digital modulations, and the channel h is supposed
time-invariant during the observation.
The key statistical property used in this paper is
that discrete signals are non-stationary at given orders.
More precisely, for BPSK modulated signals :
E{x(n)x(n - ^)|a:(0)} = a:(0)2<5(^)
E{x(rc)ar(n - £)*} = S(£)
for MSK signals :
E{x{n)x(n — f)|a:(0)} = (— l)”ar (0) 2<5(^)
E{x(n)x{n - ^)*|x(0)} = J(€)
and for QPSK modulated signals:
E{f?e [z(n)] Re [x(n - £)] |a;(0)}=Re [z(0)]2 S(£)
E{Im [z(n)] Im [ x(n - £)] |z(0 )}=Im [z(0)]2 S(£)
E{a:(n)a:(n - k)x(n - £)x{n - m)|x(0)}=a:(0)4(J(fc + £ + m)
E{x(n)x(n - £)*}=6(£),
and where S(£) = 1 if £ = 0 and S(£) = 0 elsewhere.
Note the conditional expectation, exhibiting cyclosta-
tionarity in the non-circular moment of MSK inputs.
0-7803-5988-7/00/$ 10.00 © 2000 IEEE
206
3.2. Preliminaries
Based on these properties, it is possible to derive
a set of polynomial equations that the channel must
satisfy. In the MSK case, we obtain :
M- 1
E [y(n)y(n — f)|a:(0)] = x(0)2 ^ (~l)m h(m)h(m+£)
m= 0
In the BPSK case, we have :
M— 1
E[y{n)y{n — £)|ar(0)] = z(0)2 ^ h(m)h(m + £)
m=0
lastly in the QPSK case :
E[y{n)y{n - h)y(n - £2)y(n - £3)\x( 0)] =
a:(0)4 YjmZo h(m)h(m + £i)h(m + £2)h(m + £3)
3. SOLVING THE POLYNOMIAL SYSTEM
3.1. Example
In order to introduce in easy words our contribution,
let’s give a simple example. Let the input signal be
MSK and the channel be real of length M = 3. Then
non circular statistics yield:
h(0)2 — h(l)2 + h(2)2 = h
MO)ft(l) - h(l)h(2) = f2 (1)
h(Q)h(2) = /3
whereas circular ones yield:
/i(0)2 + /i(l)2 + h(2)2 = 91
h(0)h(l) + h(l)h(2) = g2
h(0)h(2) = f3
where /,■ and g , are given (they depend on statistics
of observations y). The grouping of those equations
allows to obtain:
h(0)2 + h(2)2 = (/1 +5i)/2
h(0)h(l) = (f2 + g2)f 2
h(0)h(2) = f3
Using the first and third equations, one gets:
(/i(0) - «'/i(2))2 = h(0)2 + h(2)2 — 2ih(0)h(2)
= (/1 +^i)/2 - 2i/3
This equation eventually allows to calculate h(0) and
h( 2) up to a sign, and then h(l).
Thus we have been able to identify a real channel
by using the non-circular second order statistics to¬
gether with circular second order ones. The general
algorithm that is described in this section computes the
finite set of solutions of the polynomial system built on
the non-circular second-order statistics only. In the
next section, the choice of the channel estimation is
discussed.
Consider the ring H = € [£] of polynomials in variables
£ d= [h(0), h(l), ... h(M — 1)] with coefficients in the
complex field C ; the dual space of TZ is the set of linear
forms from H to € , denoted 'll. The evaluation of a
polynomial p at a point ( £ CM , denoted by If :pi->
p(C), is the linear form which we are most interested
in.
Given a polynomial a G A, define the multiplication
operator by a as the mapping Ma that associates q
with aq :
Ma-A -> A (2)
q i-4 q a
The transposed operator, M\, is by definition
the mapping from A onto itself so that ( q , Mj A) =
(Maq, A) = (aq, A), VA £ A, Vg G 11 so that
Ml(A)(q) =A(qa).
3.3. Lemmas
Let V be the subset H of polynomials {/1 , . . . , /at} of
degree D and belonging to 11 . Bezout’s theorem [2,
p.227] states that such a system
V : {/m(*) = 0, l<m<M}. (3)
where £ =f [£(0),£(2), ...£(M— 1)], has an infinity of
solutions, or a number of solutions smaller or equal to
Dm.
When the s