Superresolution Reconstruction of Magnetic Resonance Images Based on a Nonlocal Graph Network

INTRODUCTION: High-resolution (HR) medical images are very important for doctors when diagnosing the internal pathological structures of patients and formulating precise treatment plans. OBJECTIVES: Other methods of superresolution cannot adequately capture nonlocal self-similarity information of images. To solve this problem, we proposed using graph convolution to capture non-local self-similar information. METHODS: This paper proposed a nonlocal graph network (NLGN) to perform single magnetic resonance (MR) image SR. Specifically, the proposed network comprises a nonlocal graph module (NLGM) and a nonlocal graph attention block (NLGAB). The NLGM is designed with densely connected residual blocks, which can fully explore the features of input images and prevent the loss of information. The NLGAB is presented to efficiently capture the dependency relationships among the given data by merging a nonlocal operation (NL) and a graph attention layer (GAL). In addition, to enable the current node to aggregate more beneficial information, when information is aggregated, we aggregate the neighbor nodes that are closest to the current node. RESULTS: For the scale r =2, the proposed NLGN achieves PSNR of 38.54 dB and SSIM of 0.9818 on the T(T1, BD) dataset, and yielding a 0.27 dB and 0.0008 improvement over the CSN method, respectively. CONCLUSION: The experimental results obtained on the IXI dataset show that the proposed NLGN performs better than the state-of-the-art methods.


Introduction
In recent years, medical imaging technologies, such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), have played important roles in scientific research and clinical medicine. Notably, because magnetic resonance (MR) images have the advantage of producing clear images with high soft tissue contrast and distinct characteristics, they have gradually be-come the main data source for model training in medical auxiliary systems based on deep learning.
Resolution is one of the most important measures for MR images, and high-resolution (HR) MR images are especially helpful for clinicians when performing diagnoses. However, the acquisition of HR MR images may increase the cost of the system, increase the scanning time, and reduce the signal-to-noise ratio (SNR) due to the employed hardware device, body motion, and imaging time. To address these problems, superresolution (SR) reconstruction technology can be used to reconstruct low-resolution (LR) MR images, and higher-quality MR images can be obtained under the same imaging environment and hardware equipment. Image superresolution (SR) reconstruction refers to the process of recovering high-resolution (HR) images from low-resolution (LR) images. Currently, the SR reconstruction technique is
Traditional SR methods are mainly based on interpolation [9], reconstruction [10,11], and shallow learning [12,13]. Although interpolation-based methods are computationally simple and efficient in terms of image details, they often lead to artifacts in reconstructed MR images. Reconstruction-based methods can overcome the unfavorable oversmoothing effects of interpolation-based methods. However, they depend on accurate image registration. Methods based on shallow learning can achieve high-quality reconstructed images. However, these methods have difficulty obtaining the optimal model parameters, and they are not suitable for most image reconstruction tasks.
With the successful application of deep learning technology in image classification [14][15][16][17][18][19], target detection [20][21][22], and image segmentation [23,24], this approach has also been applied in SR reconstruction tasks. The superresolution convolutional neural network (SRCNN) [25] was first proposed based on a CNN, and it has achieved better results than those of the traditional methods. The SR approach for very deep convolutional networks (VDSR) [26] was presented to increase network depth and reduce the training difficulty based on the residual connections of a residual network (ResNet) [27]. Lim et al. [28] built a more profound and better-performing network, the enhanced deep superresolution network (EDSR), by deleting the batch normalization (BN) layer of the residual block because they found that the BN layer was not adequate for reconstruction tasks. To improve information flow, the cascaded multiscale cross network (CMSCN) [29] was proposed and used to progresssively cascade a series of subnetworks together to infer high-resolution features. The channel splitting network (CSN) [6] was presented and used to divide the features into two parts along the channel dimension and adopt different mechanisms to explore different information, thus realizing the different treatment of channel features. In addition, some other improved methods were introduced by increasing the depth of the base model or reusing the derived features to achieve improved reconstruction effectiveness; such networks include the deep recursive residual network (DRRN) [30], residual dense network (RDN) [31] and the wide residual network with a fixed skip connection (FSCWRN) [7]. However, in the methods mentioned above, the spatial features are treated equally, and the dependencies among the pixels are not considered.
To fuse the dependencies among the pixels of an input image, more methods have been proposed by researchers. The nonlocal recurrent network (NLRN) [32] was proposed to capture the available nonlocal self-similarity information by the combination of a nonlocal (NL) operation module and a CNN. The residual nonlocal attention network (RNAN) [33] was proposed to improve the ability of models to capture local features via residual local and nonlocal atten-tion blocks. This approach can also maintain the depen-dencies between the attention feature maps of images. Sub-sequently, the second-order attention network (SAN) [34] was designed with a nonlocally enhanced residual group structure and NLs to capture long-distance spatial contextual information.
Although nonlocal self-similarity was studied deeply in the NLRN [32], RNAN [33] and SAN [34], local convolution cannot describe the interrelationships between blocks. That is, it may not be used to deal with non-Euclidean data. To process non-Euclidean data, researchers introduced graph neural networks (GNNs). Graph convolution based on spectrogram theory was first proposed in convolutional GNNs (ConvGNNs) [35]. To overcome the high complexity of ConvGNNs, ChebNet [36] and a graph convolutional network (GCN) [37] were developed by approximations and simplifications. The graph attention network (GAT) [38] uses an attention mechanism to capture the similarity levels of neighbor nodes relative to the current node. Graph convolution has the advantages of possessing a strong representation ability, capturing dependency relationships, and aggregating and delivering information [39,40]. Therefore, it has also been studied in image processing tasks. To extract the correlation between features, Xu et al. [41] performed graph convolution on the features, thereby improving the SR reconstruction effect. Yan et al. [42] used GAT [38] to explore the interrelationships between different subregions in the feature map, helping to restore the texture structure and improve the reconstruction effect. However, when the above graph convolution model aggregates information, the current node aggregates the information of all neighboring nodes, and the aggregated information may interfere with itself.
The current methods have the following shortcomings: (i) Nonlocal self-similarity information cannot be fully captured by local convolution; (ii) Most graph convolution operation may aggregate negative information to the current node. In view of the above problems, a nonlocal graph attention layer (NLGAL) is presented in this paper. It can fully capture the nonlocal self-similarity information of images by combining a GAL [38] and nonlocal selfsimilarity. Furthermore, a nonlocal graph network (NLGN) is designed based on the NLGAL for MR image reconstruction. The new model can capture the nonlocal self-similarity information and the remote dependencies between the obtained feature maps. Hence, it efficiently improves the quality of SR reconstruction. In addition, unfavorable information is avoided when the current node aggregates information. In this study, a new strategy is used. When information is aggregated, the top k neighbor nodes that are most similar to the current node are selected for aggregation. The IXI dataset 2 is chosen for verification and comparison with state-of-the-art methods, such as bicubic [9], the SRCNN [25], the VDSR [26], the RDN [31], CMSCN [29], the FSCWRN [7] and CSN [6]. The experimental results prove that the presented method achieves better reconstruction results than competing approaches.
The rest of this paper is organized as follows. In Section 2, we briefly review the related work. Section 3 and Section Superresolution Reconstruction of Magnetic Resonance Images Based on a Nonlocal Graph Network 3 4 present a detailed analysis of the newly proposed schemes, followed by an extensive experimental comparison on the IXI dataset. Finally, we conclude our work.

Nonlocal Operation (NL)
The nonlocal operation was proposed in the NLRN [32], and it was formulated as follows: is the input of the NL, N k Z R × ∈ is its output, N denotes the number of image pixels, and m and k are the input feature length and the output feature length, ∈ is a nonlocal correlation matrix, and it is used to calculate the similarity relationship between the blocks.
is a nonlocal transformation matrix. The output is normalized by 1 δ with a normalization factor of δ . As shown in Figure 1.a, the NL is implemented using a 1 1 × convolution kernel, where θ , ϕ and g are the weight parameters, ⊗ represents matrix multiplication and ⊕ represents elementwise addition.

Graph Attention Layer (GAL)
For a set of input nodes, the GAL first calculates the attention coefficient between two nodes. For a node i and its neighbor node j , the attention coefficient ij e between them is defined as follows [38]: Where a denotes the attention mechanism, i h  and j h  represent the feature vectors of nodes i and j , respectively, and W represents the linear transformation matrix.
Suppose that i N represents all the neighbors of node i . To make the similarities of different nodes comparable, the Softmax function is used for normalization; this function is given as: The attention mechanism a is a single-layer feedforward neural network. It is parameterized by a weight vector and the leaky rectified linear unit ( LeakyReLU ) function for nonlinear activation. The calculation of the attention mechanism is shown in Figure 1.b. Therefore, ij α can be further expressed as follows: Where T represents the transposition operation and || is the concatenation operation.
Then, i h  can be calculated by the following formula: where ( ) σ ⋅ stands for the nonlinear activation function.
The information aggregation process is shown in Figure 1.c.

Network Architecture
The NLGN model proposed in the paper is shown in Figure  2.a. Similar to other image SR models, the NLGN mainly consists of three parts: a feature extraction network, a nonlinear mapping network, and a reconstruction network. The feature extraction network is used to extract the shallow features of the input image X . The deep features and ( ) E F ⋅ denotes the mapping function of the network, and the extracted feature E X can be represented as: Where X represents a low-resolution input image.

Nonlinear Mapping Network
The nonlinear mapping network contains a series of stacked nonlocal graph modules (NLGMs). E X represents the input of the first NLGM. And the input and output of the i-th NLGM are 1 i X − and i X , respectively. Therefore, the output i X of the i-th NLGM can be expressed as: Then, the output of the last network, which is the output of the nonlinear mapping network, can be expressed as: Where n X represents the output of the n-th NLGM,

Reconstruction Network
The reconstruction network consists of an upsampling module and a 3 3 × convolutional layer. The upsampling module first reconstructs the input feature maps into SR features through a subpixel shuffling layer [43]. Then, the network uses a 3×3 convolutional layer to build the SR features into the final output. The mapping function of the reconstruction network can be expressed as follows: Where n X and E X represent the deep and shallow features of the input images respectively, SR I represents the output, ( ) R F ⋅ represents the mapping function of the reconstruction network.

Nonlocal Graph Module
The architecture of the NLGM is shown in Figure 2.b. It is composed of a dense residual block [44] and a nonlocal graph attention block (NLGAB).

Dense Residual Block
The dense residual block is composed of three densely connected residual blocks, and each residual block has two convolutional layers and two ReLU activation functions. The output DR X of the dense residual block can be expressed as: information during feature transfer. For more information, please refer to [44].

NLGAB
The NLGAB architecture is shown in Figure 3. It is mainly composed of an NLGAL and convolutions. The NLGAB first uses a 1 1 × convolutional layer to reduce the dimensionality of DR X . Then, it chooses a 3 3 × convolutional layer with a stride of 2 for downsampling, and the downsampled features are used as the inputs of the proposed NLGAL. After that, deconvolution is introduced to restore the generated output to its original size. Finally, a 1 1 × convolutional layer is applied to compress and reconstruct the deconvolution result. The whole procedure can be expressed as follows: ⋅ represents the NLGAL, and ( )

Figure 3. Nonlocal graph attention block (NLGAB)
Generally, for an NLGM, the dense residual block is first used to deal with the input information. Then, the NLGAB is presented to capture the nonlocal self-similarity information of DR X . In addition, DR X is further processed by the bottleneck layer for compression. Finally, the output of the NLGAB and the compression result are combined as the input of the next NLGM. Furthermore, the output of an NLGM can be expressed as: Where i X is the output of the i-th NLGM,

Nonlocal Graph Attention Layer (NLGAL)
Both the GAL [38] and NLs [32] can capture the context information and remote dependencies. However, they are different from each other. NLs are used to aggregate the information between all blocks. However, only the information between the neighboring nodes in each block is aggregated by the GAL. In addition, NLs cannot fully capture the relationship between two blocks by convolution. The GAL can effectively characterize and calculate the relationship between two blocks (that is, the adjacent nodes). In general, the GAL is more suitable for capturing nonlocal information than NLs. However, the space and time complexity of the GAL is very large due to the use of a parameter matrix and a single-layer feedforward neural network. By the combination of NLs and the GAL, the NLGAL is proposed in this paper. Specifically, a linearly embedded Gaussian kernel is chosen to replace the calculation of the attention coefficient in the GAL. In addition, convolution is used to realize the linear transformation. Formula (4) is rewritten as: Where W , W θ and Wϕ are learnable parameters. And we used Formula (13) to calculate the similarity between nodes (that is attention coefficient). The operation of the NLGAL is shown in Figure 4. In addition, the GAL aggregates the information of all neighboring nodes relative to the current node when performing information aggregation. We think this may cause the current node to aggregate information that is unfavorable to it. Notably, the similarities between the distant nodes are actually very small. Aggregating similar nodes with the current node will interfere with the final information of the current node. To prevent this impact, when the NLGAL aggregates information, it only aggregates the top k nodes that are most similar to the current node. Specifically, the first k nodes with the largest attention weights are first selected; second, these weights are normalized to obtain a new attention weight; and finally, information aggregation is performed based on the new attention weight. It is worth noting that the NLGAL first selects nodes for normalization because it can assign greater attention weights to similar nodes.

Loss Function
In view of the better performance of the 1 L loss function [28], we chose it as the loss function of our NLGN. For a given training set is the total number of training samples, the 1 L loss function is expressed as follows: Where θ denotes the model parameters, and

Dataset and Implementation Details
In this study, we chose the same IXI dataset as that used for experiments using the CSN [8]. The dataset contained MR images of three types: T1-weighted, T2-weighted, and proton density (PD)-weighted images. In addition, two degradation models, bicubic downsampling (BD) and kspace truncation (TD), were chosen to simulate the LR images. For convenience, the subdatasets with specific types and degradation levels were expressed by simplified forms. ). The learning rate was initialized to 4 10 − , and it was reduced by half every 5 2 10 × iterations. In addition, enhanced measures, such as random horizontal flipping, vertical flipping and 90  rotation, were chosen to ensure the diversification of the training data and to prevent overfitting.
All experiments were conducted with PyTorch 1.50 on a PC with a 2.2 GHz Intel(R) Xeon(R) E5-2650 CPU, 96 GB of RAM and an NVIDIA TITAN V GPU (12 GB of memory). All compared methods mentioned in this study were trained for one million iterations. Finally, it is worth stating that all the experimental data listed in this study were the average results of the IXI test dataset.

Evaluation Metrics
To evaluate the performance of the tested models, the peak SNR (PSNR) and structural similarity index (SSIM) [45] were chosen as evaluation metrics.
The PSNR is defined as: Where L represents the maximum image pixel value and MSE denotes the mean squared error between the ground truth (GT) and the reconstructed image.
The SSIM is defined as:

Ablation Experiments
In this section, the performance of the proposed NLGAL was evaluated first. Then, the selection of the number of neighbor nodes k in the NLGAL was tested. Finally, we tested the impact of the number of NLGMs in the NLGN on the reconstruction performance. The PD-weighted MR images under BD were chosen as the dataset, and r =2 in the experiments.

Effectiveness of the NLGAL
To evaluate the effectiveness of the NLGAL, five structures were designed for comparison. a) The proposed NLGAB was not included in the model. b) The proposed NLGAL was not included in the model. c) Figure 1.a was used as the NLGAL. d) Figure 1.b was chosen as the NLGAL. e) Figure  4 was adopted as the NLGAL, and the strategy of utilizing the k nearest neighbor nodes was not considered. For convenience, the five structures were termed BL (Base line), BL+ (Base line+), BL+NL (Base line+NL), BL+GAL (Base line+GAL), and NLGN-. The experimental results are shown in Table 1. It is apparent from Table 1 that NLGN-produced the best reconstruction results. Compared with BL+NL, it achieved approximately 1.04 dB and 0.0034 improvements in the PSNR and SSIM metrics, respectively. In addition, the results also show that BL and BL+ performed better than BL+NL. This reveals that the fusion of information from all neighborhood nodes is not the best choice for reconstruction. This is the main reason why we proposed selecting the top k nearest neighbor nodes for fusion.
In addition, the reason why the reconstruction result of BL+GAL is stronger than BL+NL is because GAL only aggregates the neighbors of the current node when performing information aggregation, while NL aggregates information for all nodes. The NLGN reconstruction effect proposed in this study is better than that of BL+GAL, mainly because the NLGAL can learn more comprehensive nonlocal self-similarity information than the GAL.

Selection of k
To verify the impact of k on the reconstruction results, we designed two groups of experiments. We first set k =5, 10 and 15 for comparison. Then, k was further determined according to the previously obtained experimental results. Table 2 provides the experimental comparison of the results obtained with k =5, 10, 15 and NLGN-. It can be seen from Table 2 that both the PSNR and SSIM exhibited decreasing trends with the increase in k . The best scores were achieved when k =5, and these results were approximately 0.0178 dB greater than those of NLGN-in terms of the PSNR metric. The experimental results prove the effectiveness of the proposed strategy of selecting only the k nearest neighbors for fusion. However, when k =15, the reconstruction effect is not as good as when using this strategy. This is mainly because the selection of the first k nodes will increase the weight of these nodes when they are aggregated. To further select the value of k , we set k =5, 6, 7, 8, and 9 according to the results in Table 2. The experimental results are given in Table 3. As seen in Table 3, the proposed model achieved the best scores with k =8. In addition, in the range of 5-10, both PSNR and SSIM show a trend of rising first and then falling. This phenomenon can be observed more intuitively from Figure 6. Therefore, we chose k =8 in the subsequent experiments. In addition, it can be seen from the results of the above two sets of experiments that as the value of k increases, the reconstruction effect will improve because the information from more neighbor nodes can be aggregated. This is also the reason for the improvement of the evaluation index from k =5 to k =8. On the one hand, as k continues to increase, the information of the nearest neighbor node aggregated by the current node is reduced; on the other hand, the current node starts to aggregate the information of neighbor nodes that are relatively far away, and the current node begins to aggregate unfavorable information. Therefore, as k increases, the objective indicator shows a downward trend, which is why the result of k =15 is lower than when this strategy is not used.

Selection of the Number of NLGMs
In this subsection, we evaluated the impact of the number of NLGMs on the NLGN. In the experiments, the number of NLGMs was separately set to 2, 3, 4 and 5. Figure 7 displays the PSNR comparison curves with respect to the number of iterations on the T(PD, BD) dataset and SR × 2. Figure 8 shows the parameter comparisons. It is clear that the reconstruction performance improved gradually with the increase in the number of NLGMs. However, this led to a large increase in the number of parameters. Combining these findings, we chose 4 NLGMs in the following experiments.

Comparison with the State-of-the-art Methods
To further prove the effectiveness of the proposed NLGN, it was compared with state-of-the-art methods, including bicubic [9], the SRCNN [25], the VDSR [26], the RDN [31], the CMSCN [29], the FSCWRN [7] and the CSN [6]. Bicubic downsampling, which reduces the HR images to LR images with a bicubic kernel, is a method of simulating LR images that is widely used in the SR field. In this section, we first conduct experiments on the LR images generated by bicubic downsampling. Table 4 gives the PSNR and SSIM comparisons for the abovementioned methods on different dataset types and scales via BD. It can be seen that the proposed NLGN achieved the best scores. According to NLGN fully captured and utilizes information from the image itself. Compared to the comparative methods, the proposed NLGN achieves different improvements on other scales and datasets. For a scale of r =2, the proposed NLGN yielded improvements of 0.27 dB on the T(T1, BD) database over the results of the CSN [6].   As a whole, the images reconstructed by the NLGN contain more texture details and shapes than those yielded by the other tested methods.
The k-space truncation of the HR image is a process that simulates the actual image acquisition process, in which the LR image is scanned by reducing the acquisition line in the phase and slice encoding direction [6]. When the scaling factor is the same, the LR image generated by truncation degradation will lose more extensive information than the LR image generated by BD. This can be demonstrated by the fact that bicubic interpolation works better in bicubic downsampling than in truncated degradation. Table 5 shows the PSNR and SSIM comparisons for the mentioned methods on different dataset types and scales via truncation degradation. It can also be seen that the proposed NLGN achieved the best scores in our experiments because NLGN uses graph convolution to fully capture nonlocal selfsimilarity information.
For a scale of r =4, the proposed NLGN yielded improvements of 0.12 dB on the T(PD, TD) database over the results of the CSN [6]. In addition, the effect of the NLGN model in truncation degradation is better than that in bicubic downsampling, which means that the proposed NLGN is more suitable for MR image superresolution.   The middle images show the results produced on the T(T1, TD) dataset with SR × 2. The bottom images are the visual effects obtained on the T(T2, TD) dataset with SR × 4. We can reach the same conclusion as that of the experiments on the BD datasets. In the part indicated by the red arrow in the first row, the proposed NLGN reconstruction content is the closest to the reference image and is better than other methods. The middle row and the last row also show that the reconstruction effect of the NLGN is better than that of the other methods.
We further compared the complexity of the methods discussed in this paper. The comparison results obtained on T(PD, BD) with SR × 2 in terms of the number of parameters and the PSNR are shown in Figure 11. It can be seen that NLGN has the best performance, because NLGN captured nonlocal self-similarity information in the image and makes full use it. The number of parameters in the NLGN model is only 6.97 M, and the number of parameters in the CSN model with similar performance is 13.64 M. The number of parameters of the NLGN is only half of that of the CSN, but it achieves better performance than the CSN. The number of parameters in the RDN model is 22.06 M. Therefore, the NLGN has a smaller number of parameters than the RDN, but its performance is much higher than the RDN. This proves an advantage of the NLGN from another perspective. In addition, the NLGN can achieve higher performance with fewer parameters because graph convolution excavates more nonlocal self-similar information.

Conclusion
In this paper, MR image reconstruction based on nonlocal self-similarity was studied in detail. A novel model, an NLGN, was introduced to improve the quality of reconstructed images based on the combination of a graph network and an attention mechanism. First, dense residual blocks were introduced to prevent information loss during the feature transfer procedures. Then, we designed an NLGAL via the combination of NLs and a GAL, which is more effective than other approaches for mining nonlocal self-similarity information. Furthermore, we chose the k most similar nodes for fusion to reduce the impacts of the neighbors with small similarity values. The experimental results showed that the proposed model yielded the best scores among the state-of-the-art methods. In this study, we also demonstrated the effectiveness of using graph convolution to mine nonlocal self-similarity information.
Currently, this work only conducts experiments on 2D slices of MR images, ignoring the structural information and implied nonlocal information in 3D volume. MR images are mostly in 3D volumes, which contain more nonlocal information and structural information in 3D space. In future work, we will study how to apply NLGN to MR images in 3D volumes. In this paper, we demonstrated the effectiveness of using graph convolution to mine nonlocal self-similar information and provides some inspiration for the development of other methods for mining nonlocal self-similar information.