Distributed Spatially Non-Stationary Channel Estimation for Extremely-Large Antenna Systems

The purpose of this paper is to develop a distributed channel estimation (CE) algorithm for spatially non-stationary (SNS) channels in extremely large aperture array systems, addressing the issues of high communication cost and computational complexity associated with traditional centralized algorithms. However, SNS channels di ﬀ er from conventional spatially stationary channels, presenting new challenges such as varying sparsity patterns for di ﬀ erent antennas. To overcome these challenges, we propose a novel distributed CE algorithm accompanied by a simple-yet-e ﬀ ective hard thresholding scheme. The proposed algorithm is not only suitable for uniform antenna arrays but also for irregularly deployed antennas. Simulation results demonstrate the advantages of the the proposed algorithm in terms of estimation accuracy, communication cost and computational complexity.


Introduction
A massive multiple-input multiple-output (MIMO) system is equipped with a much larger number of antennas, typically tens or even hundreds [1].This offers several benefits, such as a significant increase in capacity, simplification of scheduling in the frequency domain [1], and the ability to average interference using the large number theorem [2].Therefore, massive MIMO has become a key technology in 5G wireless communication systems operating in sub-6 GHz bands.To fully utilize the spatial multiplexing gains and the array gains of massive MIMO, accurate channel state information (CSI) is essential in wireless communication systems and is the foundation of many key physical algorithms such as the multi-user detection [3] and beamforming [4][5][6].However, CSI has to be estimated using training symbols in practice, and the complexity scales with the number of transmit antennas.
Traditional channel estimation algorithms are typically based on the centralized baseband processing (CBD) architecture [7][8][9].However, this architecture is faced with challenges such as high communication cost and computational complexity.These issues require a powerful and expensive CBP unit, which may not be feasible for large antenna sizes.To address these challenges, the decentralized baseband processing (DPB) architecture has been proposed [10].In the DPB architecture, antennas are divided into clusters, each equipped with an independent and more affordable baseband processing unit (BBU) (see Fig. 1).A naive approach under the DPB architecture is for each BBU to perform channel estimation based solely on its locally received signal.However, this fully decentralized scheme would result in significant performance loss due to the neglect of inter-cluster correlations.To address this issue, the DPB architecture leverages advanced distributed signal processing (SP) techniques to achieve promising CE performance while minimizing inter-BBU communication cost and BBU computation complexity.In recent years, researchers have explored a range of distributed and decentralized signal processing algorithms for DBP-based massive MIMO systems, as documented in previous studies [10][11][12][13][14][15][16].However, the majority of these investigations have focused on the uplink equalization or downlink precoding problems, with few studies exploring the design of distributed CE algorithms.
To accommodate a larger number of terminals in the future, an approach that may be adopted is the deployment of thousands of antennas in a specific geographic area such as along the walls of a building or stadium structure [17,18].This approach, referred to as extremely large aperture array (ELAA), exhibits channel characteristics that differ from conventional MIMO systems.Non-stationarity of massive MIMO channels is observed in two aspects.Firstly, with a large number of base station (BS) antennas, the distance between the BS array and a scatterer in the propagation channel may become smaller than the Rayleigh distance, rendering the far-field and plane wavefront assumptions invalid.As a result, the wavefront should be modeled as a spherical wavefront to account for drifts in the received power, delays, and directions of incoming/outgoing multipaths along the BS array.Secondly, scatterer clusters dynamically appear and disappear, making some clusters visible to only a part of the BS array, which may cause variations in the number of multipaths on the array axis and enlarge the variations in the received power, delays, and directions of multipaths across different BS antennas [19].As a result, the channel of ELAA system may be spatially non-stationary and the channels of antennas would have different sparsity patterns, and the channel estimation in ELAA systems presents a new challenge.
In this paper, we aim to develop distributed channel estimation (CE) algorithms for spatially nonstationary (SNS) channels in extremely large aperture array systems to overcome the challenges of high communication cost and computational complexity of the traditional centralized algorithm.On the other hand, different to the conventional spatially stationary channels, the SNS channel brings new challenges, e.g., the sparsity patterns for different antennas may be different.To resolve these challenges, we proposed a novel distributed CE algorithm, together with a simpleyet-effective hard thresholding scheme.The proposed algorithm is not only applicable to the uniform antenna array, but also suitable to the irregularly deployed antennas.Simulation results indicate that our proposed distributed CE algorithm can substantially enhance the accuracy of channel estimation in spatially nonstationary channels.

Spatially Non-Stationary Channel
In this work, we consider the channel estimation in a massive MIMO system where a base station (BS) equipped with N R antennas communicates with a single-antenna user through N C subcarriers.While we will primarily focus on the uplink channels, the concepts presented in this report can also be extended to the downlink with appropriate pilot/training designs.Specifically, in the uplink, the received signal at the BS via the k-th subcarrier is given by where y(k), w(k) ∈ C N R are the received signal and noise at the receiver, respectively, and h(k) ∈ C N R denotes the channel vector between BS and user.By concatenating all the received signal, the received signal at the receiver can be written as the matrix representation as follows: where y(k), w(k) ∈ C N R are the received signal and noise at the receiver, respectively, and h(k) ∈ C N R denotes the channel vector between BS and user.By concatenating all the received signal, the received signal at the receiver can be written as the matrix representation as follows: where W ∈ C N R ×N C is the additive white Gaussian noise, and the antenna-and-frequency-domain channel matrix H can be expressed by an antenna-and-delay-domain channel matrix H ∈ C N R ×N C as follows where are the antenna-domain and delay-domain DFT matrices respectively.The DFT matrices are defined as where j = √ −1.Massive MIMO systems feature a large antenna array, causing the channel parameters, including power, delay spread, angle spread, and the number of clusters, to vary over the wide-scale array.As a result, the wireless channel becomes spatially non-stationary, particularly when the number of antennas is significantly large.In such channels, some clusters are only visible to a portion of the antenna array [3], causing the delaydomain channel sparse structures to differ among the antennas, as illustrated in Fig. 2.

Signal Model Based on DBP Architecture
Under the DBP architecture, the antennas are divided into multiple non-overlapping antenna clusters, and each antenna cluster connects to a dedicated BBU to handle its received signal.Each antenna cluster and its BBU are viewed as a local node.In this work, we consider the star network, where a central node coordinates the local nodes to performs the distributed channel estimation to approach the centralized scheme, and we assume that the local nodes can only communicate with the central node, as shown in Fig. 1.
We assume that each antenna cluster consists of N r antennas and N r = N R M , where M is the number of clusters.Then, the received antenna-and-frequencydomain signal at the m-th node is given by where M {1, ..., M}, H m ∈ C N r ×N C and W m are the antenna-and-frequency-domain channel and noise at node m, respectively.We assume that each column of W m follows CN (0, σ 2 w I N r ) and is independent with other columns.The goal of distributed CE is to recover H m from the received signals, Y m s' from all nodes.

Baseline Schemes
Before presenting the proposed algorithm, we will first revisit the MMSE scheme and the diagonal MMSE based centralized schemes.MMSE Scheme: The MMSE solution can be obtained by solving the following optimization problem min where Note that ( 7) is a quadratic optimization problem, thus the optimal U can be obtained by checking the first-order derivation of the objective function, which is given by where (9b) is due to the assumption that h and w are independent.Therefore, the MMSE channel estimator is given by where mat (y) is the inverse operation of matrix vectorization to reshape vector y to a matrix which has the same dimension as Y. Diagonal MMSE Scheme : In this scheme, the optimization problem becomes min Different from problem (7), By optimizing problem (11), we arrive at the following solution where [R h ] is the -th element of R h .Then, the channel estimate can be obtained in the same way as the MMSE scheme.
Remark 1.In the MMSE scheme, both delay-domain and antenna-domain correlations are taken into consideration through the frequency-to-delay and antenna-toangle IDFT operations, respectively.In addition, by designing the matrix U, correlations among all channel elements are accounted for.On the other hand, the diagonal MMSE scheme only considers delay-domain and antenna-domain correlations, resulting in inferior performance compared to the MMSE scheme.However, the diagonal MMSE scheme has significantly lower computational complexity than the full MMSE scheme.Specifically, the computational complexity order of the diagonal MMSE scheme and the MMSE scheme are given by O(N R N C ) and Inspired by the aforementioned schemes, we will introduce baseline schemes for the nonstationary scenario.Our proposed approach considers delaydomain and antenna-domain correlations without necessitating antenna-to-angle IDFT operations.We will begin by outlining the centralized scheme, followed by the fully distributed scheme.Centralized Scheme : In the centralized scheme, we solve the following optimization problem: where y = vec(Y), h = vec(H), which is optimized to minimize the mean square error.More explicitly, U has the following structure: where Remark 2. Compared with problem (11) for the centralized scheme in stationary scenario, there are two modifications in problem (13).First, the antenna-toangle-domain IDFT is removed.Therefore, the algorithm is applicable to the scenario where the antennas are not regularly deployed (e.g., uniform linear array (ULA) and uniform planar array (UPA).Second, , is optimized to estimate theth path by taking the correlation of all antennas into consideration.
Then, define Y and H as follows: By exploiting the block structure of U, problem (13) can be equivalently rewritten as where y and h are the -th column (corresponds to the -th path) of Y and H, respectively.By using the decoupling structure of ( 16), each U can be optimized individually and the optimal U ∈ C N R ×N R is given by Then, the channel estimate can be obtained accordingly.The computational complexity order is given by O(N C N 3 R ).Next, let us describe the details of the fully decentralized scheme.Fully Decentralized (FD) Scheme : In FD scheme, the received signal of cluster i is given by where M is number of antenna clusters and we assume each cluster consists of N r = N R /M antennas.Each cluster uses their own received signal to estimate the local channel.In particular, cluster i needs to solving the following optimization problem: where y m = vec(Y m ), h m = vec(H m ), I N r is an identity matrix of size N r , and is optimized to estimate the -th path with the antennas in cluster m.Same as the centralized scheme, cluster i solves min where y m and h m are the -th column of Y m and H m , respectively.Similar to the centralized scheme, the optimal U m is given by Then, each cluster can estimate its local channel accordingly.The computational complexity order is given by O(MN C N 3 r ).To show the efficacy of the baseline schemes, the associated NMSE performance is shown in Fig. 3.In the simulation, the channels are generated by the âĂĲ3GPP-38.901-UMa-NLOSâĂİmodel in âĂĲQuaDRiGaâĂİ with 256 antennas, 1024 subcarriers, and 20 MHz bandwidth [20].The antennas are uniformly and linearly deploy.From this figure, one can see that the NMSE decreases as the increase of SNR and the centralized scheme can outperform the FD scheme, and the NMSE gap increases as the number of the clusters.This observation calls for urgent designs of efficient distributed CE algorithms to approach the centralized scheme while having both low communication cost and low computational complexity.

Proposed Distributed CE Algorithm
In this subsection, we will introduce the aggregation based distributed scheme in the nonstationary scenario.Take the STAR network for example.Let D m ∈ R N R ×N C be a local windowing matrix in cluster i , i = 1, . . ., M. Each cluster i obtains a hard windowed signal where the hard-thresholding matrix D m with [D m ] mn ∈ {0, 1}, m = 1, ..., N R , n = 1, ..., N C is to decide which elements in Y m should be preserved.Remind that in the stationary scenario, the hard-windowing is taken by exploring the column-sparsity structure.However, in the nonstationary cases, some scatters may only be seen by partial antennas.Consequently, the strictly stationarity of delay-domain channel maynot hold.In other words, in some column of delay-domain channel, there may exist partial sparsity.Due to the above reasons, the hard windowing in nonstationary scenario is completed in two steps: 1) Inter-column sparsity-aware hard thresholding: The column-wise hard thresholding operation is carried out to get a column sparse signal.So, [D m ] m is designed by where σ 2 w is the noise power and A is the tunable threshold for hard windowing, [D m ] n denotes the n-th column of D m , 1 N R and 0 N R denote all-one and all-zero column vectors of size N R respectively.
2) Intra-column sparsity-aware hard thresholding: The element-wise hard thresholding operation is implemented on the obtained column-sparse signal and [D m ] mn is designed by With hard windowing the removed signal is given by which can be used to estimate the corresponding channel.
After hard thresholding, local node m sends Ȳm to the central node.Then, the central node merges the received (sparse) signals by The optimal U * is obtained in the same way as (16).
Then the central node gets the estimated antenna-delay- where ȳ = vec( Ȳ).Then the central node sends H m , m = 1, ..., M to each local node.
Then each local node uses Ȳ(c) m to estimate the corresponding channels by min and the channel estimate is given by and then the antenna-delay local estimate is given by Finally, each local node performs delay-to-frequencydomain DFT operations to obtain the local channel estimate.
Table 1.Computational complexity comparison of the baseline schemes and aggregation based scheme in the non-stationary scenario where NC is preserved columns after hard windowing.
Remark 3. The proposed distributed CE algorithm offers a remarkable advantage of achieving a flexible tradeoff between estimation accuracy and inter-BBU communication cost.This can be achieved by selecting different hard thresholding parameters A in ( 23) and (24), while also incorporating centralized and FD algorithms as special cases.For instance, by setting A = 0, all the local information is uploaded to the central node, making it equivalent to the centralized algorithm.On the other hand, for a sufficiently large value of A, no information is uploaded to the central node, and each local node independently estimates its channel with its local received signal.Thus, the algorithm degrades to the FD algorithm.This adaptability of the proposed algorithm adds to its efficacy in various scenarios.

Communication Cost and Computational Complexity
In this section, we evaluate the communication cost and computational complexity of the proposed algorithm.

Communication Cost
The communication cost is measured by the number of real values.Based on the star network architecture, the total number of exchanged real values for the centralized schemeis given by Similarly, the communication cost of the proposed distributed algorithm is given by

Computational Complexity
The computational complexity is measured by the real multiplications.The computational complexity of the baseline schemes and the proposed algorithm are summarized in Table 1.As it can be seen that the computational complexity of the aggregation based scheme is lower than that of centralized scheme due to the sparsity after hard windowing.For example, with N R = 256, N C = 816, NC = 1 10 N C and M = 16, the required computational complexity of the aggregation based distributed scheme is only 23.27% of the centralized scheme and the central node will take 39.08% of the total computation.while for M = 16, the overall computation decreases to 18.47% of the centralized scheme but more computation (with a percentage of 49.23%) is moved to the central node.

Numerical Simulations
In this section, the efficacy of the proposed distributed channel estimation algorithms are numerically evaluated.In the simulations, the channel is generated by the "3GPP-38.901-UMa-NLOS"model in "QuaDRiGa" [20] and the key parameters are summarized in Table 2.The channel covariance is assumed to be known which is approximated by averaging over L (set as 10 in the following simulations) channel realizations.The channel estimation accuracy is evaluated by the normalized mean square error (NMSE), which is given by NMSE = H − H 2 F / H 2 F .The NMSE performance versus communication cost of the proposed distributed CE algorithm are simulated in Fig. 4 and Fig. 5.The y-axis represents the NMSE gap between the proposed algorithm and the centralized algorithm.The x-axis is the communication cost computed by C c C dce by setting different hard windowing parameter η.Specifically, for SNR = 20dB,and SNR = −20dB, η is set as [10 8 , 25, 15, 10, 7, 5, 3, 1, 0.5, 0.2], and [10 8 , 0.9, 0.7, 0.5, 0.3, 0.1, 0.08, 0.06, 0.04, 0.02], respectively.From Fig. 4 and 5, we have the following observations: i) the proposed algorithm can realize flexible tradeoff between NMSE performance and communication cost; ii) the NMSE performance increases as the increase of communication cost as more information from the other nodes is used for estimating local channels; and iii) The required communication cost increases as the increase of SNR.This is due to the fact that in the high SNR cases, more information should be preserved after hard windowing to guarantee the estimation accuracy.

Conclusions
In this paper, our focus was on developing distributed channel estimation algorithms for SNS channels, which present a new challenge in wireless communication systems.To address this challenge, we proposed a novel distributed CE algorithm that can be applied even when the antennas are irregularly deployed.This scheme was designed to take into account the spatial non-stationarity of the channel, which can cause significant degradation in the performance of traditional channel estimation algorithms.To evaluate the effectiveness of our proposed scheme, we compared its performance to that of existing algorithms in both stationary and non-stationary scenarios.Remarkably, even when the antennas are regularly deployed, our proposed scheme demonstrated superior performance compared to existing algorithms in the stationary scenario.These findings suggest that our proposed algorithm can significantly improve the accuracy of channel estimation in SNS channels, even in scenarios where the antennas are irregularly deployed.

Figure 1 .
Figure 1.Illustration of the DBP architecture, where the "RF chain" is the radio frequency chain connecting to each antenna.

Figure 2 .
Figure 2. Illustration of the spatially non-stationary wireless channel, where (a) shows that only part of the antenna elements can see the scatters due to the large size of antenna array, and (b) implies that the delay-domain sparsities for different antennas are different.
is the optimized matrix to minimize the mean square error, and || • || F denotes the Frobenius norm.Define y and h as follows:

5 Figure 3 .
Figure 3. NMSE performance of the baseline schemes in nonstationary scenario.

Table 2 .
Summary of the channel generating parameters